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ABSTRACT - 

This manual is designed to assist administrators of 
Ehglish-as-a-second-language programs in assessing students^ language 
qrbwth It begins by reviewing some of the concepts and terminology 
to be used. It then goes on to suggest and illustrate data-recording 
formats and methods of summarizing raw gains. This is followed^by an 
example based on bowling scores to illustrate the regression effect. 
An overview of a method for separating raw gam into regression and _ 
true gain components follows. It concludes wxtfi- a brief discussion of 
a method for comparing two- different groups with differing^ _^ 
backgrounds or curricula. The appendices give details of • the data 
of the steps in performing the regfeision analyses using SPSS 
(Statistical Package for the Social Sciences). (BW) 
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The Test of Engilsh ^^s i Foreign Language£r6EFL) was developed iri 1963 by 
0 National Gbuncif oh the Testing of Eng)lsti as a Foreign Language, which 
v<f Si formgcf. iHf0i:i@h the cbbperafive ^fort of over thirty drganizatidhs, publ[c 
liS^pHvatlj^ l^^^ cdhcemed with^'testing ttie English proficiency of non- 
; tiH* :^v§:^p"^|^^^^ language applying for ad^iissjon to institutions in the 

l^^^^i States; in 1365, iducational Testing Service (EtS) and the College 
Bo:fet§: vi:?^S«vTied^ responsibility for the program xand in 1973 a cooperative 
an^nt^^{xii^(*i >Sr the bporation of the prpgram was entered into by ETS, the ^ 
Coilege Soard. andjhe Graduate JRecbrd Exarninatidhs (GRE) Board. The 
rnembership df l?ie Gpilege Board is cdrtipdsed of schoois, 6qlleges, school 
systems^ and educaijortBl associations; GRE Board members are associated 
with graduate educational: 

ETS administers the TOEFL pc^gram under the ger>eraj direction of a Pblicy 
Council that was established byi^I^M is'affiliated wit^ the spbhsbring drganf- 
zatibhs. Members bf the Policy Cbliocirfepreserit the College Board and the 
GRE Bbard and such institutldns-^rTd agencies as graduate schools of 
business, junior and comrnunlty coliegies, rrpnprofit edi^catlonal exchange 
ageRcies, and agencies of the United Sta^es^vernment. 



A continuing program of researcti related to tOEFL is carried out under the 
direction of the TOEFL Research Commjtlee. iJS six members ihciude repre- 
sentatives of the Policy CbunciL the TOEFL Cbmmittee bf Examihefs, and 
dist;nguishi(l Ehglish-Is-a-secbhc^j-!^ specialists from the acade^ 

cbmmuhlty. Currently the conimHtee rneets twice yearly to re,vi^w and ap- 
prove proposals for test-relajed research and to set guidelines for the entire 
scope of the TOEFL research program. Members of the Research Cbmmittee 
serve three-year terrns at the invitation of the Policy Cbuhcll; the chair of the 
committee serves on ttie Pblicy CounciL • 

Because the studies are specific td $he test arid the testing program, most of 
the actual research is conducted by ETS sjaff gather than by outside re- 
searchers. However, rnany projects require the cooperation of other institu- 
tions, particularly those with programs tn the teaching of English as a fbreigh ' 
or second language. Represeritjitiyes of such prograrns whb are interested in 
participating in or cbhductihg TOEFL-related research are invited to contact 
Lhe TOEFL program office. Local research rriay sorrietimes require access to 
TOEFL daja. In such cases, thejrogram may provide this data following 
approval t)y the Research Committee. Ail TOEFL research^projects must 
undergo appropriate ETS review to ascertain that the confidentiality of data 
will be protected. 



"Current (1981-82) members of the.TOEFL Research Committee include the fol- 
lowing: 

G^ Richard Tucker (chair) Center for Applied Linguistics- 

Louis A. Arena University of Delaware 

H. Douglas Brown University bf IMinbls at Urbaha-Champagrie 

Frances B. Hinpfbtis University bf Califdrnia at Los Angeles 

Diane LarservrFreemah The Experiment In International Living 

David S. Sparks . University of Maryland 
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JhtrOductlQii ' 

. ' . . - . 

A number of United States colleges and universities proviii^ ititensiye 
Eriglish-as-a-secbhd-lariguage (ESL) ins true tion for entering foreign 
students. Generally, these programs consist of full-time English language 
instruction over a period of several semesters. Students typically 
graduate to English^-medium instruction in their chosen fields^ in the s^e 
or another institution, only after meeting some entrance criterion score 
oti an English language proficiency examination, such as the Test "of 
-English as a Foreign Language (TOEFt) or the Michigan Test of English 
Language Proficiency. 

; . This manual is_ designed, to assist adiainistrators of ESL programs in 
assessing students' language growth. It is a guide for conducting studies 
at local instituOibhs to predict likely progress over time of students 
at different entry levels.. Jt also offers assistance in interpreting the ^ 
typical outcome; that is, students who enter with low scores frequently 
show greater gains than do students who enter with higher scores. This 
last issue is linked to the classical "regression to the me^n" phehomenbh, 
which occurs with any test that is not perfectly reliable. We suggest a' 
method for 'assessing the amount to which lower-scoring students can be 
expected to show relatively more apparent gain due to unreliability 
in measurement 1 If the low' scorer's "edge" in your program is in the 
neighborhood of this expe'cted value as calculated from your students' 
scores, there is" no evidence ftbsn the scores that the curriculum at the 
more advanced level is not producing adequate r^ults. (There could, of 
course, be other bases for such a change In emphasis^ but* as shall be 
demonstrated, even average raw score, gains of 80 points fbr students who 
start with TOEFL scores of 300, versus raw gains of 30 points for studentc 
who begin with scores of SOOrdo not necessarily mean that the curriculum 
is le^s effective for the higher proficiency group.) 

- _ _ _ ^ _ - , 

The mahua_l begins by reviewing some of the concepts ^nd terminology 
to he used. It then goes on to suggest and illustrate data-recording 
formats and methbds of summa r izi rig_ raw gains. This is followed by an 
example based on bbwlihg scbres to illustrate the regression ef'fect. An 
overview of a method for separating raw gain into regression and true gain 
components follows. ^It concludes with a brief discussion of a method for 
comparing two different' groups with di'ffering backgrounds' br curricula. A 
summary of the steps in the recommended analysis is given in Figure 1.. 

The appendices^ give details of the data and of thecistep^ in' 
performing the regre'ssion analyses using SPSS (Statistical Package^for : 
Che Sbcial Sciences) , a widely available statistical analysis computer 
package. If SPSS is not available at your computer installation, students 
or staff should be able to adapt this sample to other regression programs 
with little difficulty. By w^rking^ through the example and discussion^ 
beginning oh page 5, the reader shbuld understand what these methods ban' 
do, and be in a position totdecide whether to fbrward the appendices to a 
staff member or student familiar with SPSS to conduct analyses bf Ibcal 
data. If such analyses are performed, the last few pages of this section 
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Steps in Analysis 



Beginning of Semester: 



One Week Later: 



End of Semester: 



Analysis: 



Estimate Baseline Equation: 



Estimate Growth: 



-1. 

2. 

3- 

4. 

5. 

'6. 

7. 
. 8; 

9. 

II. 
12. 



Administer Pretest (A) 
Score Pretest (A)^' 
Record Data - 

Administer Reliability test (B) 
Score Reliability Test (B) ' 
Record Data - 
\Administer Posttest (C) 



Estimate Posttest Equation: 13. 



14. 



Score Posttest 
Record Data 
Keypunch Data 
Matc|i Scores 

Pfedict Reliability Test from Pretest 
(Regression Equation) . , ' ^ 

Predict Posttest from Pretest 
(Regression Equation) 

For each" pretest ^core^ A, - 

estimateg^pected growtli^by 

- subtracting'- predicted reliability 
test score, B^, f rbtn spredicted 
posttest score, C^, obtaining 
C, - B. rather than raw gairi^ 

11 
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prdvide an overview of the Interpretation of thei analyses for the illus- 
trative data used her& The appendices themselves — A on data layout j B ori 
gain analyses i and C o^^bmparative growth of two groups — give more detail 
fbrv. the use of individuals who perform the computer analyses. - 



This Sanuai uses illustrative data collected from 98 ESt students at 
San .Francisl&o State University. With the cooperation of faculty members 
and Aiiis Ri Bens (director of the American Language institute at the 
university), students were pretested with TOEFt before the fail semester 
began, administered a reliability test one week into the semester, 
§nd given a post test at* the end of the semester. These iristitutidnal 
TOEFL administrations were scored at ETS and the scores were, returned to 

the university. The analyses reported here were performed at ETS. , 

* . - \ ' . 

J The ma j or impe tus for the development of this manual came'vf r om 
P/Tofessor Bens's gentle but firm insistence that her Internal analyses of 
raw gai^n for , varying pretest score ranges' could be made more rigorous 
without becoming incomprehensibiei The intent was to go beyond th(e 
general nostrum that "some regression to the mean Is inevitable," and to 
find ways of estimating how much regressidn is to be expected for a given 
grbupy jahd hpw much of the apparent'^ change may be attributed to language 
grbwth. Earlier yersibhs qf the manual were discussed with 5an Franclsct) 

State faculty members. The current version has been revised further on 

_ _ _ r ^3 

the basis of user reaction. 

X^JS"_J^_^ A^AA^A^^^A^^^^^ ^JL^^_ two sets of numbers, such as the 

pretest and posttest scores of a group of students, we must "first sum- 
marize them if we wish to discern any general patterns and relationships 
that majv' be concealed under the blooming, buzzing confusion of their 
individual jumps arid wiggles. The best way t^o detect and understand 
underlying jpatteriis is by summarizing and simplifying a plot of the data. 
The most concise way to summarize and manipulate the data/for statistical 
tests is to redtice impbrtant features of the plbt tb the simple linear 
equatibhs bf elementary analytic geometry. ^ 

, Thus, we need to deveiop^^ome facility in translating from plots to 
formulas and back again. Suppose we have the following pairs' of pretest 
and posttest scores on six students: 

Student ' ABC D E F 

Pretest 100 200 200 300 - 350 ^ 400 

Posttest 180 190 260 350 350 420 

It is customary to plot such scores on a graph with pretest, cbbrSihates on 
the horizontal axis and pbsttest cbbrdihates on the vertical axis, sb that 
each student's pair of numbers determines a point on the grid. Following 
is a plot of the points determined by these pairs bf numbers. 
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Pretest 



Clearly, these points lie very near an "uphill" straight line passing 
close to A and F. The "best-fitting" straight line for these six points 
could be defined in several ways, but if we wish to predict posttest from 
pretest, the best-fitting straight line is defined as the line that 
miniSizes the sum of the squared vertical distances of the points from the 
line if you place a clear straightedge over the points and line it up 
by eye to come as close^as it can to have an equal number of evenly 
distributed points above and below the edge, you will probably^ come 
remarkably close to the aathematically defined "regression line of best 
fit. Translating this line to a mathematical equation is straightforward. 
A line can be described by two constants: the slope and the intercept. 

The line in the figure cuts the posttest scale art about 80, when 
extended to the vertical axis. This point corresponds to the^ theoretical 
value of the posttest score when the pretest is zero. The distance that 
the line cuts the vertical axis above or below the origin is cabled 
the intercept, and appears in the equation of the line as an additive 
constant. The line, in this example can be described by the equation 
posttest = 8b + .85 (pretest). 

That is, for any pretest score multiply by .85, add 80, and the 
corresponding vertical coordinate of the line will result. Clearly, when 
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the pretest _ score is 0^ the pdsttest score Is just 8bi When the pretest 
score is 400, the equation gives a posttest score of i85 x 400 + 8l3, or 
340 H- 8(3 = 420. The multiplicative constant > in this case .85, telis how 
fast the line ciimos as we move to the right. For every additional 
pretest point, this line climbs .85 points. For every huridr_ed additional 
pretest points, the posttest climbs 85 points. The multiplicative 
constant is called the slope of the line. If the slope is 1^ the line 
climbs at a 45^ angle when pretest and posttest are plotted in the 
same units. If the slope is 0, the line is horizontal and there is no 
relationship between pretest arid pdstteist scores. In the kinds of rela- 
tionships that we will be examiriirig, the slope is usually between 0 and 
1, unless one or both tests are too easy or too dif f icul t . In such 
cases, the relationship may not be a straight line, arid will not be 
characterizabie by a single value of slope across the range of interest. 
The simple models discussed in this manual will then riot apply. 

We turri now to a suggest ed fo rmat for recording a program's data,^ 
and for tabulating and summarizing scores manually to detect patterns of 
change. 



Recording Data 

; Monitoring language growth serves several, important furictidris 

related to the control of student flow through the program. First, an 
estimate of the likely scores at the end of the semester for studerits with 
various entering scores helps institutions plan for enr^Ulment in courses 
at various levels iri the following semester. A discussion of approximate 
methods for dbtairiirig such estimates from locally derived score summaries 
begins on page 7. 

Second, individual students can be helped to estimate the likely 
number of semesters of instruction they will rieed to achieve a particular 
level of proficiency. It must, of course, be remembered that theri> is 
considerable variation around any average level of perfdrmance. 

A third use of score records is program evaluation. Is the sequence 
becdmirig more or less effective over th^e years? Is a new textbook series 
better than the ciirrerit texts for the development of structure and written 
: expression? Such cdmparisdris are introduced on page 13 and discussed 
in detail in Appendix; C. Does the course have as great an impact on 
intermediate English studerits as dri beginners? This last question is 
complicated by the "regression effect^'V arid was a major impetus for the 
development of this manual. The questidri is introduced on page 14 and 
followed by an outlined solution on pa^es 15-19. A sutnmary of the 
1 results of applying a computer program to illustrative data begins on page 

i 19. Details of running the computer regressidri package are provided'in 

Appendix B. 

By coliectirig data In a form suitable for easy retrieval arid 
analysis i arid by using some of the techniques suggested in this manual^ 
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such questions may be addressed with data from your ihstitutibhl . The_use 

of uniform data collection and reporting formats also faqilitates pooling 
.of data across years, programs, and ihistitutlbris^ this makes it possible 

to study such questions as the impact of various teaching approaches oh 
' students with uncommon language backgrounds, who might not appear In 

sufficiently large numbers in any one institution to make such a^ study 

possible. 

^ One useful format for record keeping is a cdhtiriudus 4)rogreiPS form, 
with student Identification and background inforina_tiojij_ instructional 
history, grades, and test scotes over the student's ESL career. An 
example is given in Table !• Four row^ have been allotted to each 
student* Programs that typically retain students for more semesters could 
adjust the form accordingly. Space has been provided to record subscores 
and totals of three institutional administrations of TQEFL each semester. 
The_ second of the three TOEFL te:<itings ^ given one week after the pretest, 
is for the purpose of establishing a baseline for growth studies, and VTiii 
be discussed in the next Section. Additional col^^^mns and the reverse 
of the form could be used for instructor arid, program_ iiif brmatlon arid 
additional student' data. It is easy to keypunch scores from such records 
directly onto cards or tape, as In Appendix A. 

The following paragraphs discuss ways of summarizing such data 
without a computer to make it easier to calculate averages and to develop 
simple grajphs that may clarify relationships between pretfest 2nd posttest. 
To make the discussion concrete, we use TOEFI?* scores collected from a 
group cf studerits iri brie semester of an intensive ESL course. 

Summarizing Data 

A group of 98 students In a full-tima Iriterisive Erigllsh course at San 
Francisco State University took art institutional administration of TOEFL 
as a pretest prior to, or at the beginning of, the fall semester. One 
week after the beginning of the term, they took another form of the test 
(abbut which more later). At the end of the i3-week semester, they were 
administered a third form of TOEFL as a posttest. 

The mearis arid standard deviations of the pretest and posttest are 
given in Table 2. 

Table 2 

Means and Standard Deviations , 
San Francisco State Gain Study 

Mean Sizandgord Deviation 

Pretest^ 399.77 61.57 

Posttest 455.34 59.94 - 
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Prbgram admiriistrators are iikeiy tb want to know if a gaiti of 
about 56 points can be expected at all points on the TOEFL scale, if_ 
those with lower pretest scores are likely to grow relatively more, or if 
these who start with higher pretest scores'can be expected to grow 
relatively more. . - 

One way of examining this issue is to group posttsst scores by 
range of pretest score, as in Table 3. (The complete set of scores and 
subscores are given in Appendix A.) The first individual's pre- and 
postscores are 473 and 527 , so 473 is in the pretest range 451-500. In 
Table 3, 527 is entered at (a), the first posttest score in the^pret^st 
range column 451-500. The second student's scores ar^ 357 and 403. The 
-po-sttpsf- score, 403, is thus the f irst entry- at -<b) tn -the pretest range 
column 351-400. Once the 98 posttest scores hav^ been entered, each 
column total is divided by the number in that-bblucm to obtain^the column 
average The average of th^ pretest scores for the column has also been 
calculated from a similar table and entered at the top of each column. 
Table 4 summarizes these averages. 
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Table 3 

Posttest Scores for Various Pretest Ranges 
(Prototype) 
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Table 5 

. * Average Pre- and Pc5sttest Scores 

-» * 

lltigT^ 251-300 301-350' 351-400 401-450 451-500 501-550 551-600 601-650 



Pretest 
Mean 



293.5' 327.8 373.5 426.2 469;5 523^5 557 



513 



385 384 441.1 478.5 511.6 570 603 583 
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Raw Gain 91.5 56.2 61.6" 52.3 • 42.1 46.5 
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Even discounting the highest two ' "groups , " which consls^t of one 
student each, it is appareiit that students with pretest scores below 400 
tend to gain more than 56 points, while those with pretest scores above 
400 tend to gain less than 56 points. We can seei the reiat^onship of 
pretest to posttest more clearly by. smbbthing tb a straight line with a 
kind bf shbrtcut graph called a stem-and-leaf plbt. In this way of 
laying out liumbers, we arrange the first digits in order from the top, and 
enter the final digit in its appropriate place to represent each number. 
For' example, the scbres 550, 543, 547, 533, 523, 527, 527 are represented 
as 

55 0 ■ 
54 3,7 

;. ■ . ' 53 3 ^ 

52 3,7,7. 

Nbte that since^ 527 appears twice, the 52 row has two 7's, one for each 
occurrence of 527. 

This makes it clear at a glance that the ineMaa (middle val^e) of 
this set of data is 533 (the fburth scbre frbm the top or bottom^ and 
that row 52 is the Sodal row (contains more observations than any rows 

in the, above set of data) i - ^ 

\ . - ' 

Tabie xS shbws the data' of Table 3 recast as stem-and-leaf plots. In 
the first co^Lumn , which represents the pretest score range 251-300, the 
fouf scores 420x350, 387, 383 of Table 3 are represented as 0, 0, 3, and 
7 in -posttest ro^42, 35, and 38. The median of each column is desig- 
ted "M." Although, the H's do not f all exactly^ on„a__s_traight 1^^^ 



na 



prediction line that^omes ^clbse, tb all of them has been drawn. At th| 
300/301 pretest b-undar>v(a) this line is opposite a posttest score of 
380 predicting an= 80TpoiSt: gain. The line cuts/ the 400/401 pretest 
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Table 5 
Stem- arid-Leaf Plots 
Posttest Scores for Various Pretest 
(Prototype) 
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bbutida'ry (fe) at about 455 , . cbhsisteht with the : 56-point observed mean 
gain. It intersec-is the 5dD/5Dl pretest boundary (c) .at 530, predicting 
in average gain of 30 points for students with such. higH. pretest scbles. 
If We follow the line up to a pretest score of 600^ (d) , it^predicts only 
about a five-pOint raw gain. Thus, if «e use this rough prediction line^ 
although the mean gain at a pretest score of 460 is about 55 points, each 
additional 100 points on the pretest is thus associated with about 25 
points less raw gain, or 75 additional posttest points For each 100 
pretest- points. This line is said to have a "slope" of .75. The rough 
graph in Table 5 shows that for a grOup of students with pretest TOEFL 
scores of about 375, (e), the predicted posttest after one semester of 
intensive instruction was i50. K jever, almost one-fOurth of this ^group 
scored above 470, and about one-fourth scored 410 or less. average 
. predicti«jns can be glides to planning, but are far from predestihation. ' 



Predicting Scores for Subgroups 

Based on these raw gain estimates, we can predict likely posttest 
scores for students with similar pretest scores In siraliar programs. 
With larger samples, we could begin to develop separate predictions for 
students with specific native , language backgrbuhds and varying prior 
academic experience. It is likely thst different prediction equations 
would fit students With comparable pretest scores from Indo-European vs. 
ron-Indo-European backgrounds (particularly for TOEFL subscores), and 
students with extensive formal English study in theiivhome country vs. 
those with less prior English study. 

TO illustrate comparison of two groups, we make use of the fact that 
the present sample of 98 students contained 68 first-semester students, 
and 30 returning students, whose "pretest" scores had been obtained as 
pOsttests at the end of the summer, about two weeks before the beginning 
of the fall term. These 30 stadents are identified ita the data of 
Appendix A by a 1 in column 79 of their =first data card. One question-, 
that can thus he addressed , by these ^ata is whether continuing students ^ 
are comparable to entering studentsfin expected language, growth. The 
means and standard deviations for the subgroup of continuing studeuts are 
given in Table 6. ' ' 

Table b ■ 

Pre- and Posttest Means » Returning Students 

Mean i Standard DeVia^i^tt ^ 

Pretest 409.93 48.05 



Posttest 



458;50 48.90 
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The pretest mean of this iubgfoiap_ is about ten points above the 
ax^^rall pretest mean, and the posttest Sean about three p^Dints^a^^^^^ 
overall posttest mean. The standard deviations are about ^20 Percent 
smalif-r than for the total group. Referring to the prediction^ line in 
Sble 5 we see that these mean scores lie. very close to the overall group 
preaiction line. If we iove about one-fifth of the^way from 400 to 2i50, 
(f), corresponding -to a pretest mean -of 510, we find that the prediction 
line is about even with a posttej^t score of 560; Thus, the returning 
stadpnt-s'" average scores fail only slightly below the prediction line for 
lii students, anf:.from this rough analysis, it seems reasonable to assume 
that about the same relationship between pretest and posttest applies 
_ac-^OB^_groups. Ir. Ar^pendlx B we will show how to develop^a more^precise 
prediction line, using a cbmmorily available computer package (SPSS). A 
result of that analysis is explicit equations for the prediction Unes. 
in -AppemiK-^^'^it^''^ the equation, for -the overal.l line is 

pGc^ttest = li9;iQ3 + .84H(pretest). 

That is the predicted posttest score corresponding to a given pretest 
score is times the given pretest score plus . constant, i 9 03. 

For a pretest ' score of 200, the predicted posttest score is 287.323 (see 
Glossary, pege 27). 

For the returning students only (Appendix C), the equation is 

Posttest = U3;32 + .85i3(pretest) , 

a line that is parallel to the overall line, and about 5.8 points tower 
lor a given pretest score. That more precise analysis suggests the 
^Dossibility of a "diminishing return"^of about six points for continuing 
students Surthe small .^ple si^e (30 students) makes this no more than 
a possibility, which would have to be confirmed over several semesters m 
a particular program before it became a conclusion. 

On ^Che basis of ^the graphical analysis °^ J.^^ ^f.f 
predict that a student entering this program with a TOEFL total of ^iOO 
would gain about 80 points in one semester, to reach .a score of J8U. 
rssuminrthat returning students' growth is similar to that of entering 
students, we would expect a gain of 60 points in the |econd semester to 
?each a score of 440. In the third semester, we would expect a further 
Cin'of 45 points, to reach the score of 485. These predictions would be 
considerably improved by actually following returning students over 
several semesters, and by pooling several groups in similar curricula to 
obtain a larger sample. They could be somewhat improved by following the 
more precise estimation methods of Appendix B. 

How inch does this lower raw sc5re gain for students with pretest 
scofe^ abo^^ th" me.n have to do with instructional effectiveness, and 
how much with fallible measureinent?. We will address this problem in the 
. following section. 
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Charige and Regression to the Mean: An Example of the Probiem 

Lucy bb^vls In the Alley Oops League. The league average is 150; 
Last month she bOwled a game of 100. If we know nothing else about Lucy, 
it is reasbhable to assume that to some extent she was last month (a) a 
below average bowler^ and (b) unlucky. 

this month Lucy scored 200. lit the absence of any further knowledge 
about Lucy it would be reasonable to assume that to some extent she is (a) 
an above average bowler, and ^(b) lUcky this month. To what extent are 
these differences the result of changes in Luty.'s ability, and to what 
extent are they due to luck? The question is important if we wish to 
compare the two scores. If chance differences of this size are cOmmOn, we 
should conclude that she probably was ^nd remains an average (150) bowler, 
and that there is no real difference in the two scores, beyond chance 
variation. Hn the other hand, if bowlers are usually quite consistent » 
seldom varying by more than 20 jpins from one month to the next, we should 
conclude that there has probably been a real improvement in her/bowling^ 
ability., Problems in Measuring' change center. On the issue of assigning 
proportions of observed change to real growth and to luck. • in -Lucy s 
case, measured change is somewhere between the observed change of 100, if 
bowling scores are pet^Jfectiy reliable, and 0, if bbv/llng scores have no 
reliability. Reliability is just an indicator of the proijortion of 
variance .attributed to two possible sources of change.' If half the change 
was due to ability growth, and half due to luck, we say the reliability is 
.5. if real ability grbv7th represents a greater proportion of observed 
change, reliability is proportlbrially greater. , 

Test scbres, like bowling scores ^ have ability and chance components. 
We'can think of a "true" score analogous to a'bbwling average, and ask if 
it changes over time. If we wish to miasure ability, we would prefer that 
luck played a small role in these considerations. But we recognize that 
the particular selection of questions on the test fprnu how well the 
student slept and ate the previous night and morning, and perhaps even the 
humidity of pollen cburitj add randota errors to the oB6er^^d score. There 
is a significant chance component in any test performance. This ffiGans 
that very low-scoring individuals are probably not really quite as low as 
they appear,' because part of their lov; scbre was likely du^ to chance, ^ or 
bad luck ("negative error," in measurement terms), while very high-scoring 
individuals are probably not really quite as high as they appear, because 
part of their high score was likely due tb gbbd lUck C" positive error ^ . 
Even if no real change in anyone's true score takes place, initial low 
scorers v/ill tend to score higher than their briginal scores on the 
posttest, and initial high scorers will tend to score lower on the 
posttest simply because randbm chance factors over the two times of 
measurement will tend to cancel bUt, making it less likely that the same 
person will be equally lucky twice in a row. 

Figure 2 shows the predicted pattern when no real gain takes place 
(the posttest mean, like the pretest mean, is 2) but, as is always the 
case, measurement coQtains errors i 

25 
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Figari 2 

Changes from Pretest to Posttest (r = 
ii 



.5) 




The range, of the hypothetical pretest in Figure 2 is 0-4, but of the' 
predicted posttest, only i-3. The "iisslng" variance in the posttest Is 
chance -variance , which cannot be predicted. The observed scores bri the 
posttest wili probably still range from G jio 4, but some who initially 
scbreci 1 will score 0 on the posttest^ most wiy scor^ Vor 2,. and some 3 
or 4, giving an'average of 1.5 instead of 1, while some who initially 
score 3 will score 0 or 1 bri the pdsttest, most will score 2 or 3,, and 
some 4, giving an average bf 2.5 iristead of 3. All posttest scores w±i,i 
shrink toward the average posttest score bf 2. 

' _ ..: . ' __ 

Given this situation, the uncritical bbserver may ignore the chance, 
variance "and look at raw gains. If chance variance were 0^ tMs would be 
appropriate. However, in the presence of error, raw gain is misleading.. 
Aecbrdirig to Figure 3, those with initial scores of 4 "lose" 1. This 
interpretation is wrbrig unless the test is perfectly reliable oh both 
occasions. If reliability is less_ than perfect, such shrinking or 
"regression'' toward the mean posttest sQore must happen on the average, 

even when, no real change bccurs. 

*^ ■ . ■■ - 

a 

One Solution to the Problem 

How do we estimate true change to determihe if programs are having 
uriifprm effects for students of differing entering abilities, or to 
^detemirie the relative efficacy of different programs at different ability 
levels? - ► 



One approach is to estimate,the reliability of the test in the group 
being studied, use this to predict ''the expected, final score for each 
pretest score under the assumption that rib* real charige has ta^eti place, 
and call only observed discrepancies from this predicted score "charige." 
Applying this approach to the situation in Figure 2 ^ we would say that a 
student with an initial score of 6 and a final score of 1 had shown no 
reai growth, since he had merely kept up with the pre-post difference to 
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bility; en the °^^er hand a student vit^^ ^^^^ 

final score of 4 would be "^f ^^^'^ "'-^^ ^^^^.V^n" of 1 point for this 
her own ^against a predicted loss ^^^^^^ ehange to.be 

M-i^utldlo ?::rgar^r ^y^^PSssion. con.ider and her 

bowling, scores once more, : 
. Xf Lucy had howled a^^OO i..ediat.ly^fo^^^^^ 

.v^ith no time for intervening ^^^^^^^^ lij,, thosi of the student 
conclude thit her scores were ^§8^^^^^^^^"^' 3„d less inclined to 

who moved, from 0. to 1 because °% .I^I i„,proveinent. On 

attribute her 2DD score in the ^^^^^^ of 100 with 

^^^-^iMlnd^S^we ^/rmo^t^linia - - her scores as ,uite 

:^^^nt1s^"es"of her true f-^^^f^^^X^^ 
chanci variation and be. thus more likely o vxe^ ^^^^ ^_ ^^^^ 

later as reflecting true gain, with a simiJ.arxy 

variation. ' , 

Applying the Solution to Test Scores 

o^H Hifpctlv to the problem'of measuring 
We can apply t^his appro^ach^ d^r^t^X ^^^^^^^^..^^ ^f the pretest 
change in test scores. ^^"^ „„ „uaii call a "reliability" 

almost immediately with another test that ^,,3, ,hat little 

cest-given so ^« ^^^^^^^l^^ ^^f ^Ite ^he'ch.nge in observed 
> true change has had time to occur we can _„_s„rement error and test 

Lcores lively to take^pl- Simply bec^^^^ 

familiarization effects, l^^^^^^ reliability test forms a "hO-change" 
. X score, or " regre^ssion line ^^^^^^ "^'^ 3 saline . rather than "raw gain" 
• baseline. Chi-nges measured from t^s b^selin . ^^^^^^^^^ ^^^^^^ 

m^^^ -^^nr^easurLerSofanl practice effects .nto^ account. 

imagine a group ^^^^^^ it^L^r ^5^^ en^Ue EElKtltlon! 
400. Suppose that a the ^ of a ^^-^^^ that scored 

the group average climbs "450. ."^"^^ ^^^^ posttest mean was 370. 

around 300 at the pretest, we ^ ""J J o.her hand, might 

The students who scored around 500 at^pretest on ^^^^^ ^ 

r V PetesfSiir^O po^^si;:d^o^f;i."a high p^Lest gain only 30 points. 

The straight line regres^on equation (Figure 3) that fits these 
results is posttest = 130 plus .8(pretest). 

' - • • 

That is. 370 = 130 + .8(300) 



1550 = 130 + .8(400) 
530 = 130 + .8(500), 
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Here 130 Is the "intercept" or predicted value at the _ppsttest for a 
hypothetical pretest score of 0^ and .8 is the "slope." With a slope of 
.8^ tha predicted posttest score^ goes up eight points for every additional 
ten pretest points* In the presence of measurement error arid parallel 
tests, it is unusual for the slope to be as great as 1.0, unless the 
variance of the posttest is much larger than the variance of the pretest. 
A slope greater than 1.0 can occur, however, if gains are positively 
correlated with pretest, that is, if those who start with higher scores 
really gain much more than do those with lower scores. In such cases, any 
regression to the mean would be counteracted by the large increase in 
posttest- variance, if pretest and posttest variances are about the same, 
however, the slope will usually range from .65 to .95. In such more usual 
cases as this example, with slope - .8, even with equal growth across the 
score range^ students with scores beiQw 400 would be pushed up toward the 
mean by unreliability, and students with pretest scores above 400 would be 
pushed down toward the mean. 

If a reliability test administered .immediately after the pretest were 
to yield a parallel baseline (same slope as posttest), e.g.: test(B) = 
90 4- .8(pretest) (Figure 4), the rib- change baseline for students with a 
pretest score of 300 would be 330 and for those with a pretest score of 
500, 490. The estimated true Min from 330 to 370 arid frdin 490 to 
530 points, respectively, would tb^ be a uniform 40 points atbbth points 
iri the score range, even though raw gain appears greater for lower scores 
(Figure 3). 

If the predictibri equatibri for the no-change baseline had been found 
instead to have a flatter slope i_ such as .75, e.g.: test(B)^ 100+ 
.75(pretest), the predicted baselirie score for a pretest score of 300 
would then become 100 + .75(300) - 325. __Fbr_ a pretest score of 500, 
the baseline would be 100 + .75(500) « 475._ The pbsttest gain from the 
baseline for a pretest score of 300 would be 370 - 325 « 45, and for a 
pretest score of 500, 530 - 475 « 55. In this case^ everi though raw gain 
was less at hijher scores, corrected gain would be slightly greater at 
higher scbres (Figure 5). 
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Figure 3 

Raw Gain - less at higher pretest scores 

AB: Posttest = 130 + .8 x (pretest) 
CD: Baselirie = pretest 

530' ^ 

500' 

450' 

400 

350 

300 




300 



400 



500 
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Figure 4 



Gain from No-Change Baseline - uniform across pretest scores 

AB: Posttest = 130 ^ .8 x (pretest) 

EF: Baseline,= reliability test predictidn line 
90 + .8 X (pretest) 



/ 



Figure 5 

Gain from No-Ghange Baseline - greater at higher pretest scores 

AB: Posttest = 130 + .8 x (pretest) 
GH: Test (2) = 100 + .75 x (pretest) 
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We can perform these gain regression analyses graphically ^ by com- 
paring posttest scores with baseline scores predicted on a plot like that 
of Table 5, However, with the wlc^e availability of computer faciiities, 
and of students and staff experienced in using statistical analysis 
packages, the use of a computer regression analysis program is suggested; 
In the following pages, we summarize the results of a computer analysis 
applying the above solution of estimating change from a reliability test 
baseline to the data from our sample of students. Details of instructing 
the computer to yerf dm these analyses, and of the resulting output, are 
in Appendix B, 

Regression Using a Computer Package 

Appendix B gives the details of a regression analysis for the tdtar 
TOEFL score and for each of the three TOEFL subscbres: _ Listetiing Compre- 
hension^ Structure and Written Expression , and Reading Cbmprehension 
and Vocabulary. The Statistical Package for the Social Sciences is 
used for illustrative purj)oses , but other packages^(e^g. , SAS^ BMD^ or 
Data Text) or locally available linear regression programs would serve as 
well. Starr at your computer facility can tell you which program is most 
economical for your data and analysis needs. 
\," " 

The results of the sample regression analyses are summarized and 
briefly interpreted in this section. 

Each of the analyses is based on three tests: pretest A^ given at or 
before the beginning of the semester; reliability test B, given one week 
after the beginning of the . semester , and ppsttest C, given at the end 
of the semester. The purpose of the reliability test was to establish p 
no-change baseline, making it possible to estimate the apparent "growth" 
to be expected from measurement error arid test familiarization or practice 
effects. This baseline was established by relating test B to pretest A, 
to determine an average prediction lirie. At posttest C, student growth 
was assessed by comparing a student* s test C score with his or her pre- 
dicted test B score (las predicted from pretest A) , rather than with the 
brigirial pretest A score. Changes from A ^to predicted B scores were 
assumed to result from factors other than instruction, and were thus 
discounted in estimattlrig gains due to instruction. Regression equations 
are based on means ^ rather than medians. Thus, the few high scores in 
Table 3 have more influence fbr thbse estimates than ras the case in the 
graphical approximationl 

Fbr the total TOEFL scores , the reliability test B showed the 
fbllbwing predicted scores for various pretest A scbres. 

' I Table 7 ' 

total Pretest and Reliability Scores 

Reliability Test B 323 371.5 419.9 468.4 516.8 565.3 613.7 
Pretest A 36b| 350 , 400 450 500 550 600 



Thusi-with rib real change, lowest-scoring students v/duld appear Lb 
gain 23 points and highest-scoring students, only abcjut 14 points. 

This differential is in the directibn discussed previously, but it 
is not a large discrepancy. indeed, when we look at posttest scores 
(Table 8), we find the corrected gain bh tbtal sCbres remains larger for 
students with lower pretest scores. 



Table 8 
Total Scores and Sains 

Posttest C 371.4 413.5 455.5 497.6 539.6 581.7 623.7 

Reliability Test B 323 371.5 419.9 468.4 516.8 565.3 613.7 

Pretest A 366 '350 400 450 500 550 600 

RawGainC-A 71.4 63.5 55.5 47.6 39.6 31.7 23.7 

Corrected Gain C B 48.4 42 35.6 29.2 22.8 16.4 10.0 

Even after correcting for test reliability, a student with a pretest score 
of 300 is estimated to gain over 50 more than a student with a pretest 
score bf 400. The graphs of these relatibhships are given In Figures 6 
and 7 . 

However^ examination of the subtest scores reveals that most of this 
differential growth is concentrated in one TOEFL subtest. Listening 
Comprehension, subtest 1. Table 9 gives predicted Cl and Bl scores for 
various Al scores: 

Table 9 

Listening Comprehensibn Scores and Gains 



Posttest C 



41.7 44.9 '48.2 51.4 54.7 57.9 61.2 

Reliability Test B 34.4 38.3 42.2 .46.2 50.1 54.0 57.9 

Pretest A 30 35 40 45 50 55 60 

RaWGainC-A 11.7 9.9 8.2 6.4 4.7 2.9 1.2 

Corrected Gal ii C - B 7.3 6.6 6.0 5.3 «.6 3.9 3i3 
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The corrected Listening Comprehension gains are thus seen to foiicw a 
pattern similar to that of the TOEFL total scores^ with lower-proficiency 
students registering considerably larger gains uhan those achieved by 
students v/ith higher pretest scores. 

Table 10 gives the pattern for subtest 2, Structure and Written 
Expression. 



Table 10 

Structure and Written Expression Raw and Corrected Gains 



Posttest C 


36.8 


40.5 


kh.l 


47.9 


51.7 


55.4 


59 


.1 


Reliability Test B 


34.2 


38.0 


41.8 


45.7 


49.5' 


53.3 


57 


.1 


Pratest A „ 


30 


35 


40 


45 


50 


55 


60 




Raw Gain C -^A 


' 6.8 


5.5 


4.2 


2.9 


1.7 


0.1 




.9 


Corrected Gain 8 - B . 


2.4 


2.5 


2.4 


2.2 


2.2 


2.1 


2 


.0 



Although the \sv gain is dramatically less for students With high 
pretest scores, even becoming negative for a pretest: score of 60, the 
corrected gain is seen to be nearly uniform across the score range of 
the Structure and Written Expression subtest. This subtest does not 
contribute materially to' the lessened gain for students with high pretest 
noted for the total TOEFt scores. » 

Table 11 gives raw and corrected gains for subtest 3^ Reading Cbmpre- 
herisiori and Vocabulary^ 

Table 11 . 



Reading Comprehension and Vocabulary Raw and Corrected Gains 



Posttest C 


35.9 


40.4 


44.8 


49,3 


53.7 


58.2 


62.6 


Reliability B 


32.8. 


37.5 


42.2 


46.9 


V 51.6 


56.3 


61.0 


Pretest A 


30 


35 


40 


45 


50 


55 


60 


Raw Gain C - A 


5.9 


5.4 


4.8 


4.3 


3.7 


3.2 


2.6 


Cbirfected Gain C - B 


3.1 


2.9 


2.6 


2.4 


2.1 


1.9 


1.6 
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Although higher proficiency students gain slightly less on this 
subtest^ corrected gains across the midcile srarige of scores are agairi 
reasonably uniform. The Reading Cbmprehehsibn^ and Vocabulary subtest thus 
does riot contribute strongly to the .dimiriishihg growth found for higher^ 
proficiency students witi the TOEFL total scores. We cari conclude that 
the observed pattern in the TOEFL total scores steins primarily from the 
lower growth for intermediate and higher level students observed in the 
Listening Comprehension subtest scores. It may be that new students* 
growth i'ri comprehension of the English phonological system is quite rapid, 
but^ after this is achieved^ that further growth in Listening Comprehen-- 
si on depends on the same factors that influence Structure and Vocabulary. 

This interpretation is supported by the Listening Comprehension 
results for the 30 returnirig students presented in Appendix C. Listening 
^ ComprehensiVSn posttest scores r'or this subgroup are given in Tabl^ 12. 

Table 12 - 

Listening Comprehension Raw and Corrected Gains 
30 Continuing Students ' 



Posttest C 


37.4 


41.4 


45.4 


49.4^ 


53.4 


57.4 


61.3 


Reliability Test B 


34.4 


38.3 


42.2 


45.1 


50.1 


54.0 


57.9 


Pretest A 


36 


35 


4e 


45 


50 


55 


60 


Raw Gain C - A 


7.4 


6.4 


5.4 


4.4 


3.4 


2.4 


1.3 


Corrected Gain C - B 


3.0 


3.1 


3.2 


3.3 


3.3 


3.4 


3.4 



*Based on total group data equation 

Growth is almost perfectly uniform across the scale for this group. 
It appears that if, after a semester of f.amiiiariza tiori with spoken 
English, a student; who still has a low Listening Comprehension score is 
not likely to exhibit the rapid growth shown by newly entering students. 

Summary 

Because of measurement error, raw gain scores (i.e., simple differ* 
ences of posttest and pretest scores? tend to overestimate real growth for 
initially low-scoring students and to underestimate gain for initlBlly 
high-scoring students. By following a pretest with a reliability 
administration, it is possible to estimate the probable apparent change 
due to Unreliability and practice effects, and to discount these in , 
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estimttlhg gain due to instruction. The technique requires that we fit a 
prediction line to the test B (reliability test) scores corresponding to 
each /pretest scores arid consider change due to ^instruction at posttest to 
be* measured by deyiatibhs f rbm this prediction t>lirie, rather than from 
pretest or reliability test observed score. __Fbr example, a given student 
night -score 300 on the pretest (test A) and 330 on test B. If the regres- 
sion equation relating all test 8 to test A scores v/ere found to be 'test B 
= iOb + .8(test A^, the student's expected score on test. C, assuming no 
further change, would be the test B predictionj^ 100 + . 8(300) = 340. If 
'the actual test C score were 400, we would' estima ^i^ gain as 400 - 340 - 60 
points-, rather than the~jraw"tes^ iOO* 
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Glossary 



Analysis of cdvarlarice — the method used to determine the shared varlatlbh 
of two or more related variables. For example, a pretest is used as a 
cbvariate to, predict a posttest,' Then standard analysis of variance Is 
used to estimate the effect olf a treatment on the residual Variation in 
posttest scores i not accounted for by the pretest. ^ 



" ^A.verage- -the sum of the mS^sures, items^ scores^ etcVi divided by their 
number or frequency^ ^ ^ 

Chance variation — the variation that one would expect from the scores of 
equivalent forms given close in time without: instruction or mistakes. 

Correlation — the amount of similarity in degree and direction between two 
sets or ranks of variables;. a_ measure of the degree to which knowledge of 
one set allows to predict the other set • ^ ^ ^ 

EyxjtviilMt^ forms 6f~T test^hir~are so similar they can 

be used interchangeably and yet are not Identlcali two^ or more test forms 
'^th'at: yield^about the same mean and variability of scores , and whose it^mg 
are siinilar ^^ith respect to type^ difficulty^ dlstributidh of ^item-test 
correlations, and representative coverage of content. 

Mean (average) — the sum of the measures, items, scores, etc*, divided by 
their number or frequency. o 

Measurement error (standard error) — the deviation frbm the true score that 
is due to fchance vartatioji. For a given observed scbrei^ the specific 
value of the measurement error is unknown, but the average error of a set 
of scores describes their precision. 

^ y 
Median — the middle Score in a distribution or set of ranked' scores;, the 
pbi/nt (score) that divides the group into two equal parts; the 50th 
percentile; a measure bf central tendency o 

Mbde^-the score or value that bccurs most frequently "in a distribution; a 
measure of central tendency. 

Posttest — a test given at the conclusion of an educatibrial project or 
treatment to determine posttreatment status of, the examinee or grbup in 
regard^ to some skill, aptitude, or achievement. . | 

<j - 

Pretest — a test given to determine the status of the examinee or group in 
regard to some skill, aptitude, br achievement, as a bssls for judging the 
effectiveness bf subsequent treatmeitt. 

. - -o^ _ 

£robabiiity- -if .there is a known number, p, bf pbssible occurences of an 
event and q possible nonoccurences , and if each of the _tbtal, p + q, 
possible outcomes is equally likely, then the probability of the event is 

4Q 
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Regressloti effect or regression to the mean— tenaeticy x)f a predicted score 
to be nearer-td the mean of its distribution than the scbre.frbm which^it 
Is predicted is to its mean. Because of the effects of regression, 
students making extremely high or extremely low scores on a test tend to 
make less ex^treme scores, i.e., clbs.er to the mean^ on a second admin- 
istration of' the same test or on some prfedicted measure, in general, the 
greater the errors of , measurement and ^Srediction, the more pronounced 
»is the regression effect. For example, the height^! of parents 'and' of 
their childret?. are related, but one cannot be. perfectly predicted from the 
ottter. If we select the ten tallest individuals in the world, it is 
extremely likely that their average height exceeds the average height of 
their parents^ but it is also extreme.ly likely that' theit average .height 
will exceed the average mature height of their children. This wall be 
true even if the average height of the entire l^opulation is increasing 
slightly from one generation to the next. . . 

Reglres^ion (line)— if two paired list's of noinb-ers, say pretest scores' and 
posttest scores, are plotted in two dimensions, say pretest horizontally 
and.pbsttest Vertically, then there is exactly one straight line that can 
be drawn through the plot so that it passes clSsest to the ffieafts of ail 
those sets of posttest scores that correspond to each pretest score. On. 
the average, for all pretest scores, this is the bes£ straight-line fit to 
- the observed posttest spores. ^ • ' 

Reliabi tit}r --the extent to V7hich a test is consistent in measuring 
Whatever it does measure; dependabil i ty , ^scabili ty , trus twor thiness , 
relative freedom from errors of measurement. 

■ Slope— the steepness of ascent of a straight-line graph. If the line is 
described by the equation Y = mX + c, where Y represents the vertical 
axis the value of Y will increase m units for each unit Increase In X, 
and m is the slope.. For example, if Y = G.SX ^^100, and X increases from 
200 to 308, Y will increase half as much, from 200 to 250. 

Statttod^4eviatioh--a measure of the variability of dispersion of a 
distribution of scores. The i»K5re the scores cluster around the mean, /the 
smaller thQ stafsdard deviation. 'For a normal distribution, about/two 
thirds f 6a. 3 pfercent) of tlie scores ate vrlthin the range from on^' S.D. 
below the mean to One S.D. above the mean. V 

Truestore~a score entirely free of error—hence, a hypothetical value 
that can ne^^er be^ obtained ' by testing,, which always involves some measure;^ 
ment error. A "true" score may be thought of:as the average;,score-from an 
Infinite noSber of measurements from the same or.e:5cactly equivalent 
tests, assi^ing no practice effect or change In the Individual .during the 
testings. - , '. 

. Variance— a tneasure of variability equal to the square of the standard 
deviation! the average hf the squared deviations from the mean. The 
variance of the sum of independent random variables fs thj sum of their 
va^iaScfes.. This makes the measure useful In theory. For praj:tlcai 

■p.a^p5ies, the ^>ercent of " the standard deviation explained maybe more 
meatiingfoi than is the percent of variance explained.^ 
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* Setting Up the Data ' 

* Three scores will be available for each test being analyzed (we^will 
discuss missing data- later). If these scores have been entered in a 
cumulative record form, such as bhat illu'strated in Table 1 of the text^ a 
single card can be punched for each student. This card will contain the 
;Student ID , number, program code^ and. pPetest^, feiiability test, and 
pdsttesx scores in a fixed set of columns. Additional information, such 
as native language , number of, years of English stUfiy, or teacher ratings 
may ^Isb be included for further analyses. 

If the scores have not been copied, onto a common record fonii,/but are 
on separate lists,* it is of ten .easier . to puhch_up.ohe card for each ^ 

' testing occasion, with student ID humbe?:^ score for a given test in the 
same column on each card, and a test occasion number (1 = pretest^ 2 = 
reliability test, 3 =^'po^test) in column 80. Additional variables need 
be punched only on card one* "the three resulting decks of cards may then 
be stacked in drdeif.with deck one on top, and run through a card sorter 
once for each column of the student ID numl?er, starting with the right- 
most identificktioh digit. The resUlt^ing merged -deck will have cards in 
order within student ID's in numerical order. Listing this deck makes it 
ea^y to spot breaks in the 1^ 2^ 3 sequence a.rid to pull out cards for 
students who missed one or more testings. In but examf5le, student ID's 
(nuibers ranging from 001 to 1115 are punched in the first three columns 

, of each card. Column 4 is left blank, and the three TOEFL subscbres and 
TOEFL total are punched sequentially on a separate card for each of the 
three tests. Listening subscoires are in columns 5 and 6, Structure and 
Written Expression in columns 7 and 8, Reading Comprehension in 9 and 10, 
and TOEFL total scores in*ll through 13. .eoiuran 80 contains the testing 
occasion^ arid^ because some students were pretested on an^earlier date 
than bthers, cblumn 79 identifies such students with a numeral The 
cards were sorted and incomplete sets removed. This resulted in complete 
dat^ for 98 students. A listirig _bf the cbritrbl and data cards is given in 
Table A-1 . The explanation of the cbhtrbl cardsj such as variable list 
input format and, scat tergram, is given in Appendix B. 
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01% 483736^03 
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RON NAME LANGUAGE GAIM ANALYSIS 

VARIABLE LIST Al , A2 , A 3 /ATOT , B 1 , B2 , B 3 , BTOT , C 1 , C2 , C 3 , CtOT 

INPUT MEDIUM card _ - -- - 

INPUT FORMAT F I XED C , 3F 2 . 0 , 1 F J . b;^^»X , 3F 2 . 0 , 1 F 3 . O/^iX . 3F 2 . 0 , 1 F 3 . 0 ) 

N OF CASES 9S 
PEARSON CORR Al -TO CTOT 
OPTIONS 5 
STATISTICS 1 

READ INPUT DATA ^ 
ddl 5^^^^^^73 2 
001 575^^7527 5 

001 555053527 ,7 

002 423^31357 
002 ^6^137^13 

002 smsis^Q^ 

003 c^33232357 
003 ^2383x370 

003 50^133^13 

004 4846'^&^67 
00^ ^8^947480 
004 525248507 
006 513733403 



006 52^542463 | 

006 5840^3470 ^ 
050 512734373 I 
050 463237383 3 
050 503641423= ^; 

007 473433380 ' 2 
007 423740397 V 3 

007 543840440 

008 433633373 2 
008 394Q3839Q j 

008 414037393 

.009 393032337 2 

009 352629300 3 

009 443235370 " • 
OIQ 462428327 2 

010 493334387 ^ , 
did 573659440 * , 

011 362531307 ' ^ 
Oil 433334367 3 
Oil 41^3638407 j 



2 



013 524146463 . | 

013 594244483 

014 382834333 ^ 
dl4 383930357 ' ^ 

014 423037363 * 

015 352733317 i 
dl5 393228330 " Z 
015 423732370 

017 502537373 2 

017 46^i364l0 " , 

017 534S36430 ' • ^ 

018 504036420 2 
018 443539395 * * 

018 504X39453 ' -j 
dl9 413531357 2 

019 393027320 3 
019 444035390 
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02b 362727300 . . 1 

020 393329337 ' 2 

020 513837^20 3 
02a ^53737397 t 

021 '•9^135^17 . 2 
d2i 52^3^0^50 3 

022 312833307 1 
022 353228317 2 

022 <^<.3533373 3 

023 ^^3^33370 ^ 11 
023 ^5393^5393 ' 2 

023 533637^20 3 
02^ 525252520 - 1 
02^ 595355557 2 

024 645657590 \ 3 

025 372033300 ^ * ' 1 
025 393232343 2 

025 403035350 . 3 

026 433839400 ' ' 1 
026 453544413 2 

026 564156510 3 
G27 484541447 1 

027 524248473 2 

027 625251550 . ^ ^ . . 3 

028 483842427 11 
028 484540443 2r 

028 544244467 3 

029 363729340 , ' 1 

029 393032337 ' 2 
0^9 353236343 3 

030 413935383 « j 
030 432743377 2 
030 544348483 I 
033 363025310 " 1 
033 342631303 v 2 

033 393530347 * ' 3 

034 533942447 v H 
034 514445467 2 

034 544542470 5 
036 303134317 1 
036 354437387 j ' 2 

036 423840400 ^ / 3 

037 453536387. ^ 11 
037 423536377 2 

037 443842413 3 

038 413030337 11 

038 393327330 ' 2 

035 453633480 3 

039 434345437 1 
039 543752477 2 

039 603748483 3 

040 434241420 l 
040 ^1^5^6^37 2 
040 41444443d 3 
042 504439443 11 
042 544435443 2 

042 524542463 " 3 

043 S04643463 li 
043 514951503 • 2 
043 525850533 3 



0^^ ^^^338^17 

d^^ 55^9^^503 

0^»5 595553557 

045 59515^5^7 

0^5 6^5859603 

0^6 3322312>7 

0^6 3?2:5^3 3 3 
0^S--'3^3^5387 

-its 51itl«t5it57 



0<»8 5l4l47<»63 
0^8 53^9^5^90 
051 606361613 
051 616157597 

051 605956583 

052 322831303 
052 36:^631310 

052 423633370. 
041 483948450 
041 ^93347430 

041 59425351 3 * ' , Vr 

053 545142490 
053 555244503 

053 585353547 

054 392735357 
C54 383533353 

054 423331353 

055 <i43636387 
055 494134413 

055 56^542417 

056 534139443 
056 50444747Q 

056 574^i^«7493 

057 433529357 
057 37«*03437Q 

057 50^540450 

058 504242447 
0^8 494<»41447 

058 54^144463 
05? 483839417 

059 ^6424^^^0 

059 504545467 

060 513954411' 
060 54^«137440 

060 524538450 

061 564239457 
061 594543490 

061 57-!.4':.b47d 

062 3833323^3 
062 393D29327 

062 ^13531357 

063 353930347 
063 393735370 

063 45373S4QQ 
06^ 553440430 

064 583952497 
06^ 584647503 

065 483830387 
065 474236417 
065 5238<iO';^33 
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066 


^33838397 / - 
^♦555«?5^10 












1 


066 












2 


066 


533748460 












3 


067 


4U035387 












i 


067 


38^037383 












067 


483837410 












3 


068 


27^2^0363 












I 
Z 
3 


068 


524141447 












068 


5951^^513 






• 






069 


5^^3^3^67 












X 


069 


53^449^87 












2 


069 


59^5^7^97 












3 
1 


070 


2«33S3338Q 












D7P 


413638390 












2 


070 


^Ui39^!e 












3 


073 


3338313^0 












1/ 
Z 


nil 


'402729520 












073 
074 


41363838S 
45374U10 












3 

ii 


07^ 


52^7^1^67 












z 


07^ 


5D5l«*5^i87 














075 


484539440 












i 


075 


^63937^07 












2 

. 3 

i 


075 


5342412*53 












076 


5^45494^3 












076 


5^^250^87 












2 


076 


555250523 












3 
1 1 


077 


584555527 












077 


594757543 












• 2 


077* 


594759550 












3 


078 


393235353 












1 


078 


423437377 












2 


076 


504340443 












3 
I 


080 


485043470 












080 


455544480 










- 




080 


565349527 












3 


082 


424039403 












1 


082 


474443447 












2 


082 


524047463 












3 


083 


494851493 












1 
2 


063 


50545553d 




« 








083 


555260557 












3 
11 


08^ 


534152487 












08^ 


574448497 












2 


08^ 
065 


544353500 
514345463 












3 
1 1 


055 


544646487 












2 

3 


085 


595252543 












086 


312233287 












1 


086 


352731310 












2 


086 


443536383 












3 
1 


D87 


343228313 












087 


36253.1307 












2 


087 
088 


423633370 
484138423 












3 

ii 


088 


524536443 












2 


088 


574345483 












3 


089 


443640400 












1 
2 


089 


454844457 












089 


544149480 
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090 56<^13?^53 
090 5A^6^1^70 

090 57^6^3^67 
on ^*^^332397 
C91 ^6iiZ^l^lO 

091 56<*9^i3<*?3 

092 383031330 
092 ^2^035390 

092 ^^^138410 

093 ^•^«39^17 
093 43^»2^»5^»33 
093 56^»7^9507 
D95 ^124233390 
095 C»^»^»539<»27 

095 485041^i63 

096 ^53236377 
096 3^3835357 

096 ^♦9^»5^0^^7 

097 <i3363^377 
097 423937393 

097 564133433 

098 483836407 
098 493841427 

098 573846470 

099 483830387 
099 463437390 

099 474337423 

100 503735407 
100 404537407 

100 523539420 

101 363830347 
Idl 414435400 
IQl 463537393 

102 504337433 
102 544543473 

102 544942483 

103 3B3537367 
103 41333^377 

103 433644410 

104 474345450 
104 494546473 

104 545655550 

105 464843457 
105 495142473 

105 544944490 

106 48323939? 
106 44373639Q 

106 533534407 

107 434036397 
107 464639437 

107 554641473 

108 484642453 
108 ^544850507 

108 635448550 

109 533839433 

109 524541460 
-109 544443470 

110 393131337 
110 353130320 
lib 463331367 
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A. 7 



in ^739^0^20 

111 ^9<f7^^^67 

111 5<i^i7^7^95 

SCATTgRGBAM 

STATISTICS 

SCATTERGRAM 

STATISTICS ^ 

SCATTERGRAM 

STATISTICS 

SCATTERGRAM 

STATISTICS 

FINISH 



BTdT(25b,65d) ,CTdT(25di650) WITH ATOT C 250 , 650 ) 

Att -- 

BI(20,7D),C1C20,70) WITH Al(20,70) 

ALi__ 

B2{2Q»7Q) »C2C20r70) WITH A2(2Q»7g) 
All 

B5C20»70)»e3(20>7b) WITH A3i:2b,7b) 

ALL ; 



EKLC 



is 



Appendix B 

,1. 

Analyzing and Interpreting the Data 



Reading the Data 

To enable the computer to read the information, the names of the 
variables and where to find them inust be given, in SPSS, this is accbm- 
piished with four cards, immediately following the required RUN NAME 
card. 

The first, VARIABLE LIST , gives a list of variable names separated by 
commas. We have four scores on each of three cards^ arid have decided to 
call them Al, A2 , A3, ATOT, Bi, B2, etc. These labels are puriChed on the 
VARIABLE LIST card begiririlrig in column 16. 

The second card, INPUT MEj>imlCARa , with "eARD" beginning in column 
ie, is self-explanatory. 

The third card, INPUT FORMAT , Indicates the location of each Variable 
'iri order on the cards. Beginning in columri 16, with nXEB, "^d 
coritairis a FORTRAN format statemerit (4X. 3F2. 0. 1F3^0,/4X, 3F2.0, 
1F3.0/4X, 3F2.0, JJ3^.4Xj^ . This code Instructs the computer to skip the 
first four ID spaces, read three two-digit numbers (which it wiU assign 
to variables Al * A2^ arid A3), read a three-digit number* which it will 
assign to variable ATOT, skip to the nex^card, read the four B variables 
from the reliability test iri the same format, skip to the third card, and 
read the posttest 6, scores to complete brie case. 

The final control card in this sequeriCe, J^^E-^ASES_i8, Indicates the 
number of times this procedure must be repeated to complete reading the 
data. 



Performing the Arialyses 

The basic descriptive .statlsticsrr.me_ansj standard de^^^^^^ and 
correlations— are obtained with a sirigle set of control cards. Th?- 
equations of the prediction lines are obtained with a set of control 
cards for each test scores 

The basic descriptive statlstics—mearis , standard deviations, and 
correlations— are bbtairieli with a single set of control cards: 

PEARSON CORR Al TO CXOT 

OPTIONS 5 

STATISTICS .1 ^ , 
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These are followed by the 



-READ INPUT DATA card^ and by the "data deck. 

the first page bf_the computer output, listing the control cards and 
showing which card columns were read for each variable, is reproduced as 
Table B-i (page B.IOJ. 

The resulting univariate statistics and correlations are given 
in Tables B-2 to B-4. 

The release of SPSS used at the ETS computer facility that' 
subsequent analysis request cards after the first set follow the data 
deck. This requirement may vary with other releases of the SPSS package. 

The next analysis performed (total test score) is a prediction of the 
total reliability test, B, from, the pretest scores. A, to establish a 
no-change baseline. This is followed by a prediction of total posttest 
scores, C, from Ai 

Change for each pretest score is the difference between C and B: 
predicted posttest score minus predicted baseline for that value of 
the pretest. __Bbth analyses may be performed with a two-card request^ 
Immediately fbllbwihg the data deck: 

SCATTERGRAM BTOT (250,6505, CTDT (250, 65U) WITH ATOT (250,650) , 

I ■ 

STATISTICS Att ' ' (Table B-5) 

This first, shows the relationship of BTOT with ATOT^ followed by the 
estimates needed foj: the baseline prediction equation,^ Then the relation-^ 
ship bf CTOT to ATOT is plotted, followed by the estimates required, fbr 
the posttest prediction equation. The ranges (250,650) scale the plots 
,^for easier r-eadabidi ty. The range (200,700) would also work* 



Table B-6 gives the plot of th'e BTOT bbservatibns (on the vertical 
axis) corresponding tb each bbserved value of pretest score for each of 
the 98 students. Each asterisk represents bhe_student's pair of A and B 
total scores. The* nomeral 2- represents- pairs bf scores occurring .forjiwo 
different individuals. 

The scatterplbt shbws a strong, qui t^ linear relationship between 
pretest and reliability test, with a. concentration of scores in the lower 
tvjo-thirds bf the two score ranges. The 297Tpoint range of the relia- 
bility test, 300-597, is slightly less than the 326-point range of the 
pretest.; : • 

Table B-7 give^ the information necessary to determine the prediction 
equation for the no-change baseline. The two .underlined quantities, 
INTERCEPT (A) = 32.33932 and SLOPE (B) = 0.96865^ give the constants of 
the baseline equation: BTOT = 32.339 + .969ATOT. 
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B.3 



This line has been drawn oh the scatterplbts of Tables B-6 and*B-8. 



Plotting the Nb-Chahge Baseline 

The baseline can be plotted by choosing any two convenient values of 
^^QtSOQ and 600, for exampl e--calculating J^e corresponding predicted 
values of BTGT from the prediction equatibTTT arid plotting the two ATOT^ 
BTOT values on the graph. Thus, 

BTOT (1) = 32.339 + .969 x 300 » 323.0, arid 
BTOT (2) = 32.339 + .969 x 600 = 6i3.7 ^ 

As a chetk^ it isa good idea to choose another value, say ATOT (3) « 
Sbb, yielding BTOT (3)' « 516.8. The three points should lie on a single 
straight line. if they do nbt^ recheck the calculations and plotting. 

\ - - - 

^ Table B-8 gives the scatterplot relating pretest and posttest scores. 
The rarige_(343~603) of posttest scores has further diminished to 260 
points.X The lowest posttest score, 34, is 56 points higher than the 
lowest prs^test score, and the highest posttest score is 10 points lower 
than the highest pretest score. 

Table B^9 gives ,the estimates for the constants in the equation 
predicting poe^test from pretest: 

erSr = ii9.i03 + .84iatot^ 



This eguatlon predicts that a student with a pretest score, ATOT, of 300 
would be expected to score around 371.4 at posttestj CTOT» a raw gain of 
71.4 points, while a student with a pretest score of 500 would be expected 
to score around 539.6 » a raw gain of 39.6, or about .32 points less than 
that anticipated for the lowest-scoring students. 

if we graph this prediction equation and compare it with the slope » 
1 "posttest pretest" dashed baseline that Is implicit in using raw gain, 
it shows that initially low-scoring students gai-n much more than do 
initially high scorers, instead of comparing raw gains, however, \re wish 
t o cbinpare gains f rom baseiine. Comparing .p^osttest scores with^ t^^^^^^^ 
rio-chahge baseline BTOT = 32.339 + .9S9 x 300 yields an estimated gain for 
a student with a pretest score of 300 of GTOT - BTOT = 371.4 - 323.0 - 
48.4 points. For a student with an inttial score of 500, the estimated 
gain is 539.6 - 516.8 = 22.8 points. Thus, although the discrepancy is 
deduced from the raw gain difference of 32 to a corrected difference of 26 
points, it appears that regression effects are not sufficient to account 
for the discrepancy in gain across the score range for this particular 
sample of studentsi . » 
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Analyzing Subt^s^ , , 

In the example, the request for analysis of tptal scores^ fbllbwed 
by the analysis request for the Listening eomprehenslbh subtest: 

SeATTERGRAM Bl (20,70), CI (20,70) WITH Al (20,70)* 

STATISTieS ALL (Table B-^iO) 

The resulting plots and statistics are given iti Tables B-11 to B-14i 

Again, the plots are quite linear, with a large displacement from the Bl 

vs. Al plot to the CI vs. Al plot, suggesting considerable real growth 
over the time interval from Bl to CI, 

the equation of the baseline is 

Bl = i0.959 + .782A1 (Table B-12. Find it! It's not underlined 

this time.) 

For the pbsttest, the prediction equation is 

CI « 22.167 + .650A1 (Table B-i4) 

A student With a pretest score of 30 would thus be expected to 
achieve a Bl score of 

10.96 + .782 X 30 » 34.42, and a CI score of 

22.17 + .650 X 30 « 41.67 for an; estimated gain of 

7.25 

A student with a pretest score of 60 vould be expected to achieve 
a Bl score of 

10.96 + .782x50 « 50.06 and a CI score of 

22,17 + .650x50 "^ -54-^7 for an estimated gain of 

4.61 

^ ^ _ 

These lines have been drawn oti the scatterplbts (B-11 and B-1 3) and 
contrasted with t he-dashed- raw-gain baseiine. ^ With Listenings Cbmpreheti-- 
sion, as with TOEFL total, it appears that the Initially low-scoring 
students did in fact gain fflo^e in this class than did those who s tax: ted 
vdth higher scores, even after measurement errors and practice effects^re 
taken intS account. The estimated gain is positive across the scale, even 
though "raw gain" is negative for students with pretest scores above 50, 

^ I - 

*The C-20,70) scaling is again to improve the readability of the grap^ 
Without, iti the Al axis would be in units of 27. 0^ 30.3* 33.6, etc*. 
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The next ahal'ysis request calls for Infbrtnatldn about the Structure- 
ahd Written Expression subtest: 

SeATTERGRAM B2 (20,70), C2 (20,70) WITH^A2 (20,70) 

STATISTICS ALL (Table B-15) 

The resulting plots (Tables B~16 and B-18) show an unexpectediy 
tighter^ scatter of points for C2 vs, A2 than for B2 vs^ Ai^ even though G2 
and A_2 are more greatly separated in time. The higher correlation, ,803^ 
for C2 and A2 vs. .760 for B2 arid A2 corifinns this visual impression. The 
baseline equation is 

B2 = 11.247 + .765 x A2 (Table B-17) 



The equation predicting posttest from pretest is 

C2 ^ 14.541 + .742 X A2 (Table B-i9) • 

A student with a_ Structure and Written Expressibri pretest of .30 would 
have an expected baseline score • 

B2 = 11.247 + .765 x 30 ^ 34.20 and posttest 

C2 = 14.541 + .742 x 30 = 36-.80 for art estimated gain of 

A student with a Structure ind Written E^xpresslon pretest score of 50 
would have an expected baseline of 

62 = 11.247 + .765 x 50 ^ 49.50 arid posttest * 

C2 = 14.341 + .742 X 50 ^ 51.64 for an estimated gain of 

drily .46 points less than the gain expected for a student with a low 
pretest score. Gains for Strac;:ture and Written Expression are thus quite 
uriifbrm across the score range. 

Again, the prediction lines have beeri added to the scatterplot. 

The relationship of- pretest, reliability test^ arid posttest. for 
Reading Comprehension and Vocabulary is obtained with the request: 

f\ : Scattergram B3 (20,70), €3 (20,70) WITH A3 (20,70) 
Statistics ALL' (Table B-fO) 



The resulting output is reproduced iri Tables B-21 to B-24. 
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The equation for the baseline is 



B3 ^ 4.507 + .940 A3 and for the posttest 

C3 = 9.259 + .889 A3. ^ 

According to these formulas, a student with a Reading Cbmprehensiofr 
and Vdcabuiary pretest score of 30 would have a predicted reliability test 
score of 



4.607 + .940 X 30 = 32.29 and posttest 

9.259 + .889 X 30 ^ 35.93 for an estimated gain of 

3.14 \ 

A student with a pretest score of 50 would have a predicted :^relia-- 
bllicy test score of | 

4,607 + .940 X 50 = 51.59 and posttest :^ 

9.259 + .889 x 50 = -3^^ for a gain of 

2.13 ; 

Again, these estimated gains are not strikingly different across the score 
range. 

After the balance of the analysis request cards ^ the deck ends with a 
FINISH card. - 

The c_bmputer system used for the sample anaiy sis requires a card 
after FINISH^ but this end-df-job signal may be different at another 
computer facility. 

The complete listing of the deck for the sample run is given in 
Appendix A. it may be useful to punch these cards and to perform a test 
run to check that the procedures are compatible with your version of SPSS. 
If the sample run works but your ^ real data analysis does hbt> check 
carefully for keypunch errors, Slssing punctuation, and cards out/pf 
brder.^ Computers are ridiculously literal contraptions, and will not fill 
in ah omitted cbmma iii a set of instructions. [ 
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ititeTpr etlng P a t terns of Change 

if the reliability test Is gj.ven within a. week after the pretest. 
It Is probably reasonable to assume that there has hot bepn enough time 
for a significant change to result from Instruction. This does riot mean 
that Individuals' scores are expected to, be identical from pretest to 
reliability test: because of measurement error* neither test is a 
perfectly reliable indicator of true score, and the correlation between 
the tests, r,., will be less than one because of .this. in addition, 
small changes due to practice effects arid iricreased comfort with the 
testittg' situation will take place even in brief iritervals betvreen test 
admiriistrations. Thus, the group_ mean may well go up, and the test 
variance may change in the process of establishing our rid-charige baseline. 
The baseline is more accurately thought of as little influenced by the 
important sources of charige—iristructlonai programs— that we are studying. 

In the case of our real data* the slope of the line predicting BTQT 
from ATGT, .959, is almost 1.0, arid is greater than the correlation, .922. 
This is an Indication that the variance of test B is greater than that of 
test A. Ip-fact, the ratio of the standard deviation of B to that of A 
^ust be 'III = 1.05, (as can aiso.be obtained from Table B-2) so that the 
varlarice'fiii iricreased by (1.05)% a 10.25 percent iricrease from pretest 
to reliability test. Thus, although regression to the mean does take 
place by about 8 percent (1 - .922), It Is almost„of f set by the 5 percent 
Increase in staridard deviation. This Increase in spread of scores 
suggests that iriitially higher-scoring students benefitted more from the 
experience 6f taking the pretest, or learned more during the intervening 
jiteek before the reliability test. For the total scores of this particular 
sample, using the no-change baselirie will riOt yield conclusions very 
different from those obtained from using raw gain. The standard deviation 
of CTOT, the posttest. Is oaly 97 percent that of the pretest, ATOT, and 
the slope of CTOT on ATOf Is .841. Overall change shows a slight negative 
correlatibri with pretest score, and = total score growth from a no-change 
baseline with slope very nearly i^.O remains greater for those with 
inltlallyg^p scores. The ratio of posttest elope to reliability test 
slope is- '-ggg = .858. 

\ _ _ _____ 

Let us examine the sabscores to detennihe if this pattern is the case 
for the separate parts of the test. 

The Listeriirig Comprehension subicores are deribted Al, Bl, _arid CI. 
The correlation of Bl with Al Is ^794^ and the slope is .872. The 
variance of the Listening Comprehension section changes very little 
from pretest to Bl , the reliability test, with the staridard deviation 
decreasing by only about 1.5 percent. The slope of CI vs. Al is .650, 
and the correlation is .695. The standard deviation. of CI thus shows 
considerable further decrease, tb^ly 93.5 percent that of Al, and this 
decrease Indicates that change is cbrrelated negatively with pretest score 
to a considerable extent. The ratio of pbsftest slope to reliability test 
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slope Is relatively low: *W = -831. This compaiativeiy flat posttest 
slope yields high grbv/th 'estimates for low pretest scorers, and low 
growth estimates for those with high pretest scores; 

It appears that maeh of the apparent greater growth of low scorers on 
total test scores is due to this pattern for the Listening Comprehension 
subscbres. 

The Structure and Written Expression scores, A2, B2, and 02^ exhibit 
a different pattern. The slope relating B2 to A2 , .765, is almost 
identical to the slope of C2 vs. A2, .742. The correlation of S2 and A2, 
.760, is essentially the same as the slope, showing that no change in 
variance occurred between the pretest and the reliability test. 

The ratio of the B2 and C2 slopes, .97, telle us that growth measmed 
from the baseline, B2 to posttest €2, is essentially uniform across the 
score range for Structure and Written Expression,- with initially high- 
scoring students gaining nearly exactly as much as do lov7er-scbririg 
students when measurement error and practice effects have been taken into 
account. This occurs despite the decrease in standard deviation from 
pretest to posttest, the standard deviation of C2 being only 92.5 percent 
of the standard deviation of A2. . 

The only way that a decreasing standard deviation can be associated 
with cbhstarit gains from the baseline to the posttest is for these gains 
to be positively aSsbciated with pretest scores, but negatively associated 
with a component bf the Variance of the reliability test that is not 
related to either pretest br posttest. Indeed, the A2 - C2 correlation,^ 
.863, is higher than the A2 - B2 correlation, .766, suggesting that the 
"lost" variance in C2 was not related tb pretest variance. One mechanism 
■for such a result would be a short-t^^rm practice effect for some students 
on the reliability test, which V7as tied but because of additional test- 
wisene'ss" among all students by the time bf the posttest. 

\ .' ' _ _ _ ____ _ _ _ 

The Reading and Vocabulary scores^ A3, 83^ and C3, show an almost 16 
percent increase in standard deviation from pretest to reliability test, 
and a slope of .940, again approaching 1.0. Unlike the other subtests, 
posttest standard deviation remains about 6 percent greater than that of 
the^P^etest, rather t:hah dropping to only about 93 percent of that value. 

/ The correlation of A3 With C3, .834^ is only slightly less than that 
of k3 with B3, .858. and the slope of C3 vs. A3, .889, is .95 times the 
slope of B3 vs. A3. Again, as with Structure and Written Expression, 
grbwth is almost uniform \across the store scale^ with .high-scoring 
students gaining only slightly less than low scbrers, after allowing for 
the effects bf measuremenl: error; 

The general tendency for .the reiiabrity test tb have greater 
variance than the posttest suggests that diiferential familiarization 
effedts do take place in short-term retesting, temporarily adding variance 
that "washes oat" o^ei the longer term. Although such adaitional variance^ 
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can depress correlations between pretest and reliability test^ it inter- 
feres much less with regression lines, affecting their standard error 
rather than their slope. 

in any of these cases, the regression line predicting Test 2 s^core 
from Test 1 score determines a baseline expectation consisting of those 
a^verage observed-score changes that are attributable to measurement error 
and to test practice e»ffects. We now get on with teachihg Eriglish, and 
administer a posttest "at the point at which we wish to evaluate growth. 
We count as change due to instruction, neither raw gain from pretest ,^ nor 
raw gain from the reliability test, but the difference between observed 
posttest score arid the expected Test 2 score predicted for each pretest 
score by our regressibri equation^ which gives a predicted score based on 
experience with these students for each pretest score, if no further 
change due to Instruction has takeri place since the reliability testing^ 
we do not expect individual students to achieve scdros identical to their 
scores on the reliability test (measuremerit errbr^ agaJr.), but we do 
expect students in a given pretest score range to Have posttest scores 
clustering around the same prediction line obtained froai the reliability 
admirii^tratidri. 

if we wished to be extremely conservative^ we could choose to assume 
that any true changes observed from pretest to reliability test would have 
happened again without iristructiori, arid double those changes to obtain an 
expectation for a third testing'. This would be unreasonable, however, 
since test familiarization effects are not Ukely to operate strongly 
among those already familiar with the test. The major advantage of the 
reliability testing is to enable us to estimate reliability, and likely 
regression effects, in our particular student group. Uriless our students 
are representative of all students who take TOEFL, reliability arid 
probable regressibri effects are likely to differ from the TOEFL Manual 
statistics, based bri samples of all candidates. However, if we consist- 
ently obtain similar baselirie prediction equations over several semesters , 
and if the e^itering pbpulatibri in bur program does not change in native 
iTnguage background or distributibri of prof iciency^ level , we can 
eventually dispense with a new reliability testing for each group and use 
the equation developed for previous groups, giving. a riew reliability 
testirig only occasionally, to check the continuing validity \bf bUr local 
prediction equations \^ 
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Table B-l 

SPSS BSTCH SVST.EM 06/10/81 P«i 1 

spssj^oR os/j60i mim h; release 9;0i mi ibi i98i 

CURRENT DOCUMENTATION FOR THE SPSS BATCH SYSTEM 

ORDER FROM HCGRSH-HUL: SPSS, 2ND ED. (PRINCIPSt TEXT) ORDER FROM SPSS INC. : SPfS STATISTICAL. ALGORITHMS . 

SPSS UPDATE 7-9 (USE H/SPSS,2ND FOR REL. 7, 8, 9) KEyHDRDS: THE SPSS INC. NEWSLETTER • ' 

SPSS POCKET GUIDEi RELEASEJ 

SPSS PRIMER (BRIEF INTRO TO SPSS) 



DEFAULT SPACE ALLOCATION,, ALLOWS FOR.. 102 TRANSFORMATIONS 

WORKSPACE 71680 BYTES W$ RECODE VALUES + LAG VARIABLES 

TRANSPACE 102^0 BYTES 16^1 IF/COMPOTE OPERATIONS 



1 RUN_NAME LANGUAGE GAIN ANALYSIS 

2 VARIABLE LIST A1,A2,A5,AT0T,B1,B2,B}.BT0T,C1,C2,C3,CT0T 

3 INPUT MEDIUM CARD 

It INPUT FORMAT FIXED(4X,3F2.0,1F3.D/«,3F2.0,1F3.0/«.3F2.0,1F3. 



ACCORDING TO YOUR INPUT FORMATi VARIABLES ARE TO Bi READ AS FOLLOWS 



VARIABLE 


FORMAT 


RECORD 


COLUMNS 


Ai 


F 


2. 0 


i 


5- 


6 


A2 


c 


2, 0 


1 


7- 


.S 


A3 


F 


2. 0 


1 


9- 


10 


ATOT 


F 


3. 0 


1 


11- 


13 


BI 


F 


2. 0 


2 


5- 


6 


S2 


F 


2. 0 


2 


7- 


8 


B3 


F 


2. 0 


2 


_9- 


10 


BTOT 


F 


3. 0 


2 


11- 


13 


CI 


F 


2. 0 


3 


5- 


6 


C2 


F 


2. 0 


5 


h 


_B 


C3 


F 


2. 0 


3 


9- 


10 


CTOT 


F 


3. 0 


3 


11- 


13 


OVIDES FOf 


! 


12 VARIABLES. 


12 RILL BE READ 



03 
0 



IT PROVIDES FOR,. 3 RECORDS ( 'CARDS' ) PER CASE. A MAXIMUM OF 13 'COLUMNS' ARE USED ON A RECORD. 

5 N OF CASES 98 

6 nmm corr ai to ctot 

7 OPTIONS 5 
6 STATISTICS 1 

***** PEARSON CORR PROBLEM REQUIRES 3168 BYTES HDRKSPACE 



9 READ INPUT DATA 
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Table 1-2 



LANGUAGE GAIN ANALYSIS 



FILE HONAHE (CREATION DATE = 08/10/81) 



(18/10/81 



PAGE 2 



VARiABtE 


CASES 


UP I a 








77 67 1 a 




n 


98 






A3 


98 




. W • Q y U A 


ATQT 


A B 

?8 


ill, (boi 




Bl 


98 




0 « 0 jO ^ 


B2 


OR 




7.JS7? 


B} 


)8 


39,7551 


7.23S5 


BTOT 


98 






CI 


98 


51,JD61 


6.51^7 


C2 


98 




6.8005 


C3 


98 




jMt 


CTQT 


98 


«5.3367 


59:9381 



I ^ TaI)leB-3 

LANGUSGE GSIH SNSLYSIS 08/10/61 PAS£ J 
Fit! NONAHE (CREATIQN DATE » 08/10/81) 

. .---PEARSON .COME L ATI ON C 0 E F F I C U N T 5 - - — - 







Ac 




ATOT 


ii 


12 


B3 


BTOT 


CI 






r nnnn 




U » 35 7 / " " 




U 1 / 7*1 T*" 


0.iS550itj( 




D iKliii 


D;e9<i4)tit 


B;5B74»» 


At 


U • 0 1Qt'«" 


} nnnn 




0 a^^Um 


W t 90 f b " 






0.filZ7H 




O.B025«« 








l.DQOO 




0.7393«» 




D.8580i<» 




0.6367«« 


O.7O50KK 


ATOT 








l.OOOO 




fl.7975«i( 




. Lmw* 


0.7380)t» 


0.7920»« 


51 










1.0000 


0.e91fl»« 


0.,7??8«)t 


. O.J080«» 


04095itK 




12 


(i.655d»)( 


0.7603XX 


0.695l«» 


5.??75«« 


Q.S910»« 


1.0000 


D.69Sltf» 


0.8S55»)* 


O.S301IH* 


0.7S5AM 


B3__ 








0.8643)t« 


0.7798»» 


().69(Iit« 


l.OOOO _ 


l).91'i<ii(i( 


0.7186KI' 


0.7^0S«« 


BTOT 




Q.8127«*( 




O.S22^M 


Q,908fl»» 


0,8855«)t 


0.91«tt« 


r.ODOC 


0.79^5i(» 


0.8099X* 


ci 




P-&253!(i( 




O.7380«« 


(j.809S»» 


O.IS30i«i( 


6.7185«« 


D.?9^5»jt 


i:ODOP 


OiiSS'i'ii'ft 


C2 


0.5874«« 


0.8025»t( 


Q.705fl« 


fl.7?20»» 


0.i9^3tt« 




b.7"406»« 


0.8D99«a 


0.65<(<i«)t 


1.0000 . 


C3 


0.5882i'« 


0.7135«S 




0.8Q3<iit« 


0.7173»» 




d.8930(ti( 


0.e31?»tt 


D.iaSOKK 


0.7202«» 


CTOT 




,0.7913«» 






0.818a»« 




0.859B)fti 


0.8961»» 




0.8810H!* 



* - SI6NIF. tl .01 - SIENIf. IE .001 (99.0000 IS PRINTED IF A COEFFICIENT CANNOT BE COMPUTED) 



H 



1 
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Table B-4 

----- Y ' OS/lfl/fll PAGE '* 

LANGUAGE GAIN ANSLVSIS ^ ' , uo/iu 



FILE NONAHE (CREATION DATE ■ 08/10/81) 

; ; P^E A R 5 0 N C 0 R R E'L UI b N COEFFICIENTS 





C5 


CTOT 


aI 


0.5882»« 




S2 




0.7713** 


A} 




I) ♦ OUH / 


ATGT 




O.SSAOii* 


91 


0:7173^ 


0.8186** 


82 




D*7^9^*i* 


63 




0.8598** 


mi 




0,89£1** 


ci 




0.8(^0^** 


C2 


0.7202** 


0.8810** 


C} 


l.ODOO 


0.888A*it 


CTOT 


0.888^** 


1.0000 



it - 5IGNIF. t£ .01 ** • SIGNIF, LE .iiOl 
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Tal)le B-5 

Wimi PAGE 5 



■ CPU TIME REQUIRED.. 0.69 SECONDS 



■IC SCATTERGSAH itOT(25d,i50),(;TOT£2^ii,i50) flllH AT£jT(250.650) 
11 STATISTICS ALL : 



***** GIVEN UORICSPACE ALLOWS FOR (JAyS CASES F£Si! SCATTERSRAH PRQBLEH ***** 



. -A 



ERIC 



tXNGUXGE GXIH XNAtYSIS 



05/10/81 



FltE__JONAHE_ (CSEAIIONJATE » 08/10/81) 

SMTTERGRAH OF (DOWN) BTOT 

. m.tU JlO.flO' 350.00 390,00 
,+- — +- — +—-.+-—+ — + — + 



... .J ...( ACROSS). ATOt 
430.00 470.00 510. ( 




P& 250. GO ^' 
ERjC ?50.00 290.00 



— +— -+ — + — +—■-+ — +.—+-—+.+-+-—+—-+-— +"-"+.., 
jjij.Oo 370.00 410.00 ^50. OD 59D.D0 SJOiOO 570.00 S10;flO 650 ;00 



S50.0D 



(10.00 



570.00 



530 ;S0 



490.00 



4S0.0D 



410.00 



330.00 



290. OD 



25D;00 



tmmH GXIN AflAtYSIS 



Table B=7 

D8/1S/81 PAGE 7 



smisTics.. 

CORREUTION (R)- 
STD ERR OF EST - 
PLOTTED VALUES ■ 



0.922J5 
25.11263 



R SQUARED 
iNTERCEPT (AV 



d. 85075 
S2 . 33932 



SIGNIFICANCE 
SLOPE (B) . 



0:00000 
0.96Mi 



EXCLUDED VSLUES- 



0 



HISSING VALUSS 



^mmi*^ IS PRINTED IF A COEFFICIENT CANNOT BE COMPUTED. 



H 
0> 
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Table B-8 



LANGUAGE GAIN ANALYSIS 



W/wii 



PAGE 



FilE fiONAHE (CREAtidfi DATE = d8/iO/Slj 
SCATTERGHAM of (DOWN) CTOT _ 
270.00 JIO. DO 350.00 

.+- — +—»-♦- — +- — +— -+— 

iSO.OO * 

■ I 
I 
I 
i 

610.00 + " 

I 
I 
I 
i 

570.03 + 



390.00 430.00 



. JACR05S) ATOT.. 
470.00 510.00 



550i00 590.00 



530. OU + 



650.00 




iSD.Oi) 



610.00 



570.00 ' 



530.00 



190.00 



450.00 



410.00 



370.00 



330. DO 



290.00 



250.00 



* Table B-9 ! 

UNGdAGE SAIN ANALYSIS ^^^l^^^l' PAGE 9 ! 



STATISTICS,. K V • 

mtmrnm- ^ o.aW • - . fl.K653 significance , o.ooooo 

STDERHFEST- W.mi INTERCEPT (A) - 119.10332 ■ SLOPE (B) - O-S^lOS 

PLOTTED mUES- »8 EXCLUDED VALUES^ 0 MISSING VALUES - 0 



^mmW is PRINTED IF A COEFFICiENT CANNOT BE COMPUTED. 



Uh fi-10 

LANGUAGE GAIN ANALYSIS ' 68/10/81 PASE IB ' 

CPU TIME REQUIRED.. 0.7} SEcONDS ' 



12 SCATTERGRAH ' Bulo, 711), Cl(20. 7(11 filTH ilt.20,70) 

13 STATISTICS AU 



GIVEN WORKSPACE AtLOHS FqI! CASE? FDR SCaTTERBRAM PROBtEM *m* 
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Table B*ll 



UNGijAGE GAIN AHAvfelS 

FILE NOHAME (lliEATlDN OATE > 
SCATTERGRAN QF (DOHN) Bl 

22. 50 27.50 S2.5r; 
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Appehdix C 



Comparing Two Groups 



^-On some occasions, we wish to compare two or more groups to determine 
if the growth rate for students of a particular language or educatidnais 
background is typical of that for the total grdiip^ or to compare growth of 
different groups across semesters or across alternative ESL curricula. 

<? . _ . . _ _ __ _ . 

This appehdix presents two approaches to this problem. The first 
^approach is simply to redo, for the subgroup, the regression analysis 
described in Appendix B and to compare the reguiting equations and graphs 
with those of the, total group. No tests of statistical significance are 
employed" in this comparison, since the object Is to determine whether any 
observed diffetence is large enough to be of practiii^l significance to the 
program, rather than . to assess probability. If ^tests of statistical 
significance are desired^'the second approach may\^be employed. This 
method, analysis of cbvariancel (ANCOVA)^ is also based on regression and 
yields an estimate of the difference between two groups at the mean 
pretest score, and of the probability that a difference this large could 
occur by chance; 

Cbmparlng Regression Lines 

To illustrate this methdd_, we use the subgroup of 30 returning 
students irom the group of _98_ ESL students discussed previously. Table 
C-1 shows the first page of the regression output ^ giving the cdntrdl^ 
cards, format, and first analysis request. The 30 cases representing 
the returning students were separated from the total data deck for this 
run, so the only chnange from the control cards for the total group is in 
the number of cases, N = ciards; With a larger data would be 

more convenient to change the input format to read the variable RET in 
cdlumn 79, and to use the SPSS "SELECT IF RET 1" option to read the 
entire deck bty: to prdcess dtily these 30 cases. 

Table C-2 gives the means and standard deviations 'of the subtests and 
total tests. We see that the mean of pretest ATOT, 409.93, is abdut ten 
points higher than that of the total group and the mean of pdsttest CTOT^ 
458.5, is about three points higher than that of the total group. For 
this analysis, we ignore the reliability test B, and focus on compara- 
bility of changes from pretest to posttest; 

Table C-3 shows the graph of posttest on pretest, and fable e-4 gives 
the intercept, 113.63, and slope, .841, of the regression equationi^ 
CdmparlTig £his equation, 

CTDT = 113.63 + .841 x ATOT 

with that for the total group, 

• erbr = 119.10 +^.841 ^ AtOTi 
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We see that^ the slopes are identical, but that ^^^^ ^ 8*^4" -P"f f 
score, the returning itudents gain, on the average, about 5.5 point? 
less than do all studenti. This does hot ajspear to be a large enough 
difference to be of practical significance. If we were tc^ do a separate 
analysi^ on the 68 new students, we would of course find that their 
intercept was a bit higher than 119, since the total group is the comb ina 
tlon of the returning students and the new students. However, even a 
difference 5f eight total points between the two groups is Jiot large. 

Moving to the Listening Comprension test. Table €-5 shows the graph 
and tA]^b c4 the constants for the regression of el on Al for returning 
students. The equation, 

^1 = 13.40 + .799 X Ai, appears different from that for the 

V 

total group: 

CI = 22. IT + .650 X Al. 

For a pretest Listening Comprehension score of 30 the returning 
group would have a predicted posttest score of 37.4, and for a pretest 
score of 50. a posttest score of 53.4. For the same two pretest 
sc?r"s, the total group' s. predicted posttest scores are 41.7 and 54.7. 
respectively. 

Although a five-point difference on total scores is only one- 
eightletrof the mean'score. and is not educationally significant, 
a four-point difference oii the Listening Comprehension Subtest for 
iJ-^co^ students is about one-twelfth of the "L-' fj^^ 
significance, it suggests that students who remain low on Listening 
Com^rlhln^ior a^er a sister of instruction continue ^^^l^^ 
lower than^average rate in the following semester. This sorting effect 
on rate of learning of previous instruction helps to explain the higher 
ope for the returning group: those who learned more in the first 
emeJter, and who thu. had higher pretest score, for the second semes e^^ 
continue io progress at a faster than average rate, but s tlx 1 do not show 
ariarie a ^aw glin as hew students, unless their pretest scores are 60 or 
higher. 

-r KISS r-7 and C-8 give the graph and statistics for the prediction 
of .s??ucrurVanfwri'tte\ Ix'Mio^ Posttest scores from returning 
students' pretest scores. The equation, - 

C2 = 9.93 + .875 x A^, again has a higher slope and lower 
intercept thaS does the equation for the total group: 



Cj- 11.25 + .765 x A2. 



The difference is qt^Utatively unlike that for Listening Compre^^^ 

,ion, however. Returning st^ents ^th P^^^^ J^^^^f .'^^^yj^re^' 
be predicted to achieve posttest scores of 36.2 and 53.7, respeccivexy. 



New students with the same pretest scores wouid.be expected to reacK 
average pbsttest scores of 34.2 and 49i5i 

In this case, returning students gain frdiii two to four mare raw-score 
points than do new students, partly offsetting the lesser gain observed 
for Listening Comprehension scores. 

Finally , the Reading Gbmp rehens ion and Vocabulary scores and 
constants for the returning students are given in Tables C-9 and C-iO. 
The resulting prediction equation is 

C3 = 8 . 17 + .903 X A3. This is almost identical to the 

equation for the total group: 

e3 = 9.26 + .889 X A3. . 

Returning students with Reading Comprehension and Vocabulary pretest 
scores of 30 and 50 would have predicted scores of 35.3 and 53.3^ 
respectively, as compared to predicted scores of 35.9 and 53.7 for new 
students with the same pretest scores. 

Comparison of the pretest-posttest relationships for returning 
students with those for hew students has revealed that although slopes 
are identical for total test scores^ and the lower total raw gain for 
returning students is not practict^lly important, this total test-score 
pattern holds only for the Reading Comprehension and Vocabulary subtest. 
In Listening Comprehension, returning students with a giyeh pretest score 
gained substantially less than did new students with the 'same score, while 
in Structure and Written Expression, returning students, gained more than 
did new students. 

If statistical tests of significance are desired for these differ- 
ences, or for differences arising; from the comparison of data from two or 
more different curricula or from two cr more different semesters, the 
SPSS ANCOVA analysis offers a convenient method. ANCOVA uses one or more 
predictors (covariates) in regression equations to explain as' much of the 
posttest variance as possible. The "leftover" or residual variance that 
cannot be explained by such covariates as pretest score, years of language 
study, or language group is then sub je^cted to^ traditional analysis of 
variance. This procedure compares the variance among group means with the 
residual variation within groups to estimate the probability that observed 
group differences cbuid have occurred by chance. The resulting "F" 
statistic has a probability distribution that depends on'the number of 
groups and on the number of individuals within groups;. An important 
assumption of the classical analysis of variance is that regressidn lines 
are parallel within groups. Although we have seen -that the observed 
regression lines for new and returning students art hot strictly parallel 
for subtests i and 2, the lack of parallelism is not signiflcaht and does 
not violate the parallelism assumption seriously. 
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Tabie G-l 1 shows the first gage of output of an SPSS ANCOVA analysis 
using two groups: the 68 new students of dUr earlier analyses, and the 30 
returning students. 

The input format card has been changetf to reaS column 79 of card 1, 
in which returning students are coded with the number 1. The variable 
list card haa been changed by adding a variable- ''RET/' which is 1 for 
returning students arid 0 otherwise. 

The first analysis request card, 

. ' :ANOVA CTOT BY RET(O^l) WITH ATOT, 

asks for an analysis of covariance with total posttest score CTOT as the 
dependent variable, RET as the group identification code, and pretestATOT 
as the ' covariate. Table C-12 shows the CTOT means and numbers of the 
total sample arid of each subgroup. Table G~13 gives the analysis of 
covariance results for the total test scores. 

Although the covariate^ ATOT, predicts a highly significant propor- 
tion of the variance'of the total posttest score (F = 283.88 , probability 
of this large a value by chance less thari .0005) , the main ef|ect of 
returning status predicts very little of the posttest variance (F = 1.39, 
a value that could be observed by chance in almost one in four cases). 
The total variance explained is significant, but only because of the 
contributibri of the covariate. Thus the statistical test cbrifi-^^ the 
judgient based on the comparison of • regression lines: returning status 
does not significaritly irifluence language growth predictioris in this 
sample. Table C-14 gives the estimated unadjusted posttest differences i 
with the new group 1.4 points below the grand mean and the returning 
group 3.16 points above the grand meari, arid the adjusted contrasts after 
taking the pretest into account, reversed to show a 5.46-point negative 
weighted effect for returning students and a 2.41--pdirit positive weighted 
effect for new students. These effects are weighted by the number of 
cases to sum to zero (68 x 2.41 - 30 x 5.46 = 0). k; 

As we have rioted, this effect is not statistically sigriificarit. 

Analysis of cbvariarice is not restricted to a single covariate. 
Tabie C-i5 gives the contrbl cards for analysis of covariance using both 
the pretest and the reliability test as predictors; One could also use 
years of prior English study or a cbde fbr native language group (e.g., 0 
for Indo-Europesn, 1 for non-Indo-^European) as additional predictors. One 
could not use months of English study in the U.S. as a cpvariate, since 
it would be confounded with new/returning status and wbuld explain away 
the very effect that we wish to study. 

Table C-16„repeats the means for the two groups, and Table C~17 gives 
the analysis of covariance table. We note that both covariates cbntributa 
significantly to predicting the posttest, with BTOT, Being closer in time 
to the posttest, contributing more uriique predictive power than does ATOT^ 
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although their common cbmpbhehti F ^ 203,57, carries the burden of the 
predictioni The addition of BTOT as a cbvariate effectively wipes out any 
effect after the first week of returning status (F = 0^ probability = 1). 
The contrasts In Table C-i8 show that the adjusted mean differences have 
dropped to a negligible .13 - (-i36) « .43 points. 

Table C-i9 gives the control cards for the singie-covariate analysis 
of the Listening Compr eherislbn subtest, Table €-26 the means, and Table 
€-2i the analysis of variance table. Here sgaln, the statistical test 
partly confirms our Intuitive judgment. The effect of returning status on 
Listening^eomprehenslon is statistically significant (p = .615). Table 
C-22 shows a covariance-adjusted effect of .79 - (-1.7?) =2.58 points In 
favor of new students. Adding the reliability test Bl as an additional 
cbvariate, however (Tables e-23 to e-26) , makes the effect of new vs. 
returning _status_ drop to statistical insignificance (p .279)^ with an 
adjusted effect (Table C-26) of only .98. 
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The analyses of the Structure and Written Expression subtest comprise 
tables e-27 to C-34. Using A2 as a cdvariate, the effect of group 
(returning vs. new) is not significant (F = .297, p = .587)^ and the 
estimated effect size is only .59 pbints (Table C-26). Adding B2, the 
reliability test, as an additional covariate drbps the value of F to .652 
(p - .819), and reduces the estimated effect to a negligible .19. 

- _4 

tables -C-3 5 tb C-42 give the analyses for the Reading Comprehension 
and Vocabulary subtests. As was noted by comparing regressibn lines, 
differences are slight, and neither the analysis wl th pretest only 
as covariate (F = .888, p - .348) nor that with both pretest and relia- 
bility tests as covarlates (F - .092, p = .763) approaches statistical 
significance. 

The analysis of coyariance thus offers a convenieht test of group 
differences, with considerable increase in power afforded by ..using 
covarlates to remove what would otherwise have been "error" variance. 
It should be kept in mind that the analysis of covariahce_presUpposes 
random assignment to grdups, however^ and is not capable of cbrrecting 
for preexisting differences among groups selected on some ability, 
correlated crlteribn. In cases in which the assumption of parallel 
wi thin-group regression lines is violated, generalizations of analysis 
of covariance, which fit a grbup-by-pretest interaction to the data 
(Ragosa, 1981), may be considered. 

In comparing different groups, it is important tocheck fbr differen- 
tial attrition. If subjects have dropped out of both groups before 
posttest^ it is necessary to check that the pretest scbresjbf those vho 
left each grbup are comparable, if they are not, some cause of dropping 
out, linked tb test scores, may have been operating, and any difference 
in outcome may.be attributable to this differential dropout rather than 
to soie positive characteristic of the program. The classical exampl^ of 
this is the teacher who says^ "you, ybu, arid you, stay home tomorrow," on 
the day before the posttest, but more subtle influences may operate to 
produce a similar results 
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Concluding Note ^' ' ' 

It shoaid- be stressed that «e have not attempted to estimate 
differing tru. gain icores for Individual Students f y^^P^f^f | 

scores. The difficuities inherent in that task can (and have) rilled a 

bpok (Harris i 1963). 

Rather we have followed the recommendations of .Cronbach and Furby 
(1970) who' point out that the correlational question, "What kinds of 
ittdlviduais grow more?" can be answered without estimating true gain 
Scores for individuals, but is best approached by studying predicted 
scores. 

Neither did we adopt the point of view that since measuremeRf scales 
iay be arbitrarily stretched, only changes uncorrelated with pretest 
("structural changes") qualify as ■■real" change. For example, if every 
student were to gain 10 percent from his or her protest level, posttest 
scores would correlate perfectly ^th P"'^^«J^ ^'=°'^f,vJ^t/hlX /coring 
claim that change had P^^^^' ^'^^^^^^'^''^f^^'' 

student, had gained more than had ^^^^^^^^ ^^^^J^ 
interpretatirn amounts to assuming that xjnlts of thfe TOEFL scale are 

meaningful to users in behavioral terms. 

Good luck with your analysis, and please communicate any problems or 
suggestions for clarification to th^ author. 
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SPSS PRIHER (BRIEF INTRO TO SPSS) 



DEFAULT SPACE AilDCATION:, 
yORKSPACE 7168(1 BYTL, 
TRANS^ACE 102« BYTES 



fiLLQHS for;,, 102 TRANSFORHSTIDNS 

409 RECDDE VALUES + LAG VARIABLES 
\M IF/COHfUTE OPERATIONS' 



1 RUN JAME LANCUAGE GAIN ANALYSIS • 

, 2 VARIABLE LIST Al ,A2,i}.ATDT.Bl ,B2,BJ,BT0T,ei,e2,C3,CT0T 

3 Input Hediuh card 
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10 


ATOT 


F 


3. 


0 


1 


11- 


13 


BI 


F 


2i 


0 


2 


.5- 


i 


B2 


F 


2. 


0 


2 


7-' 


B 


83 


E 


2. 


0 


2 


9- 


10 ■ 


BTOT 


F 


5. 


0 


2 


11- 


13 


ci 


F 


2. 


(j 


3 


5- 


6 


C2 


F 


2. 


Q 


3 


7- 


J 




F 


2. 


0 


3 


9- 


10 ■ 


CTOT 


F 


3. 


0 


3 


Jl- 


13 


PROVIDES FOR 


12 VARIABLES. 


12 yiLL BE READ 



IT PROVIDES. FOR 3 RECORDS ('CARDS') PER CASE. A MAXIMUM OF 13 'COLUMNS' ARE USED ON A RECORD. 

5 N dF CASES 3d 

6 PEARSON CORR Al TO CTOT 

7 OPTIONS 5 

8 STATISTICS 1 



M»* PEARSON CDRR PROBLEH REQUIRES 3168 BYTES SORKSPACE ***** 



9 READ INPUT DATA 



LANGUAGE GAIN ANALYSIS , 

.0 

FILE NCNAHE (CREATION DATE = 08/11/8]) 



VARIABLE 

Al 
A2 
A3 

ATOT 
Bl 
B2 
B3 

BTDT 

ci 

C2 

CS ^ 
CTQT 



CASES 



„ KESN 

<i7i«67 
J7.}i30(l 
56.2000 

' Vl VM 
^0;«33 
39.^333 
421.1333 
.: 5 1 . 3333 
A2.S667 
42.6667 
458. 500.0 



STO DEV 



5.2115 
5.0IS2 
6.3594 

48. 0423 
6.9765 
5.6793 
7.0987 

57.9784 
5.3584 
5.9982 
6,3698 

48,9008 



05/11/81 



Table C-2 

PAGE 2 



0 



US 



117 



UN6UA6E GAIN AHAIYSIS 



08/ii/fii 



Table C4 

PAGE < 8 



niE__ M3NAHE_ ICREATIOIDATE « 08/11/81) 



SCATTE56RSI1 OF 



SSD.OQ 



ilO.OO 



570,00 



530.00 



.+ 
+ 

i 
I 
i 
i 
+ 

i 
i 
I 
i 
+ 

i 
I 
I 
I 

+ , 
I 



mm) eioT 

m.M . Mi). I/O 
— t— — 



jsij.ou 



590.80 ^30.00 



._(AC80SS)JT0T. 
470.00 510.00 



SSCaO 590.00 630. ( 



490. OG 



450.00 



410. OG 



}70.00 



ZSOiOO 



I 
I 

1 

I ' 
1 

« » I 

ttl 

i I 

• I 

• . 'I 
■ * )( 1 



I 



I 

r 
i 
I 
i 
I 
I 
I 

290.110 + I 
I I 
I . • . ; . I 
I ' I 

... i 1 
* I 

— + 

zsQ.lid 29ii.oii mM 370. do> 



410.00 



4SO;00 



MM 



$30. 01 



S7D;flO 



+ 

i 



♦ 
I 
I 

I 
I 
♦ 

t 
I 
4 
I 
+ 

I 

-I 
r 



ao.oQ 



$50:00 



(SOrOO 



(io.id 



SUM 



5J0.OP 



4'9fr.0fl 



451) < 00 ' 



4t0;06 



}70.00 



330.00 



2'9d.0l) 



250.00 1 1 n 



LANGUAGE GAIN ANALYSIS 



Table U 

08/n/8'i : PAGE 9 ., 



STATISTICS.. 
CdRREUTION (Rl- 
STO ERR OF EST - 
PLOTTED VALUES - 



O.S2l>51 

3b 



R SQUARED 
INTERCEPT (A) " 
EXCLUDED VAtUES- 



D.S83t2 
0 



SIGNIFICANCE 
SLOPE (B) 
HISSING VALOES 



D. 00000 
0:641^8 

b 



IS ?RINTED IF A COEFFICIENT CANNOT BE COHPllTE!). 



m 



* 

H 
0 



LANGUAGE GAIN ANALYSIS 



FILE NONAHE 
SCAtTERGRAH OF 



(CREATION DATE = 
(DOWN) Ci 
22.50 27.50 



08/11/81) 



32.50 



J7.50 



^^2,50 



(ACROSS) Ai 
^7.50 52.50 



08/li/fil 



57.50 



Table e-5 

PAGE ii 



62.50 



67.50. 



70.00 



>65.li0 



60. OS 



55.00 



50.00 



45^00 



<i0:00 



35.00 



. 30.00 



25.00 



2(j.00 

ER?C - 



I" 

i 
+ 

I 
I 
i 

.1 
+ 

I 
I 
i 
I 
+ 

i 
I 
i 

i_ 

I 



.00 



I 
I 
i 
I 
I 
i 
1 
I 
i 
i 

I \ 



-»-»- 

3 « 



25.00 



30.00 35.00 40.00 



45.00 



50.00 



53:00 



60:00 



65.00 



+ 

■"+. 

70.00 



70.00 



65.00 



60.00 



55.00 



50.00 



45:00 



40.00 



35.00 



30.00 



25.00 



20.00 
1 C 



tANGUAGE GAIN ANALYSIS 



Table C-6 

08/11781 PAGE 14 



STATISTICS.. 
CORREUtlON (Rj- 
STi) ERR OF EST - 
PLOTTED yALUES - 



0.7771> 
3i«15J 
30 



R SqdARED 
INTERCEPT (A) = 
EXCLUDE!) VALUES" 



■■ 0.60^03 
13.39855 
0 



SIGNIFICAHCf 
SLOPE (Bl 
HISSING VALUES 



o.bdodi) 

0.79919 



IS PRINTED IF A COEFFICIENT CANNOT Bl- COMPUTED. 



H 



ERIC 



UNGUAGE GAIN ANAtYSis 



08/11/81 



Taliie C-7 
PAGE is 



FILE NONAii'E 
•SCATTEiiGRAM OF 



70;QO 



65.00 



50 



i 
I 
I 
i 
t 
I 
i 
I 
i 
+ 
I 
I 
i 
I 
+ 

I 

I- 

I 
I 

.00 + 



60. so 



55.00 



45.00 



40. Od 



35.00 



30.00 



25. op 




imATldN DATE - 08/11/81) 
(DQHN) C2 . _. 

22.50, • 2?. 50 32.50 



37.50 



42.50 



lACeOSS) A2 
47.50 52.50 



57.50 



62.50 



67,50 



« « 



I 

i 
I 
I 
t 
r 
I 
i 
I 
I 
i 
I 
i 
I 



00 25.00 30.00 



35.00 



40.00 



45.00 



50.00 



55.1 



70.00 



60.00 



65.00 



— +. . 
70.00 



i'mm EsiN mtYSis 



Table C-8 

06;'ll/31 PAGE 1? 



UiisTics.. 

CORRELATION (R)- 
STD ERR OF EST - 
PLOTTED VALUEL 



0.73192 
30 



R SQUARED ' 
INTERCEPT (A) - 
EXCLUDED VALUES- 



r.mi 

0 



SIGNIFICANCE 
SLOPE (B) 
HISSING VALUES 



O.OQOOO 
Q;87^85 
0 



IS PRINTED IF A COEFFICIENT CANNOT BE,CQHPUTED. 



0 

t - 

H. 



. 129 



n 



ERIC 



\ 



LANGUAGE Um XHAlYSIS 



SCATTERGRAH Of 



70.00 



iiS.OQ 



iQ.OD 

I 



55.00 



50^00 



'iS^OO 



•40.00 



35.00 



30.00 



25.00 



20.00 



.+- 

+ 

I 
I 
i 
i 
+ 

I 
I 
I 
I 
+ 

I 
i 
I 
I 
+ 
I 

I-- 

I 

i 

+ 

I 

I 

I 

i 

+ 

I 
I 
i 
+ 

I 
I 

I— 

i 

+ 

i 
i 
I 
+ 
i 
I 
i 
I 
+ 

I 
I 
i 
I 
+ 



(CREATION OSTE = 
mm C3 
22,50 27;5(f 



.... if. 
32;5D 



37:50 «;50 



.(ACROSS) A3 
47:50 52:50 



08/11/81 



57;SO 



Table -0-9 

PAGE 23 . 



20,00 



25. CO 



30.00 



35:00 



40.00 



62:so 



67.50 



45:0Q 



50.00 55:00 



£0.00 



65.00 



•+. 
♦ 

I 
I 
I 

i 
+ 

I 
i 
i 
I 
♦ 



+ 
I 

I 
I 

+■ 

r 
I 

i 

I 

I 
I 

i 
I 
+ 

I 
I 

-I 
r 
t 
I 
I 
I 
i 
t 
I 
i 
i 
I 
♦ 

I 
i 
i 
I 
+ 



— +... 

70.00 



70:00 



65.00 



60.00 



55.00 



5b. di) 



45.06 



40.00 



35:00 



30. OS 



25.00 



20.00 



m 



LANGUAGE GAIN ANALYSIS 



Table C-10 
wnvm hli k 



STATISTICS.. 
CORRELATION (RJ- 
STD ERR OF EST - 
PtQTTID VALUES - 



R SQUARED - 
2.80576 ' INTERCEPT W " 
30 EXCLUDED VAtDES- 



0. 81257 
8:17332 

d 



SIGNIflCANC! _ 
SLOPE (B) 
HiSSiNG VALUES 



IS PRINTED IF A COEFFICIENT CANNOT 3E COHPUTED. 



D.OOODO 
0.90297 
0 



132 



133 



Table C-ii 

SPSS _ BATCH StSTEH . ag/l?/2l ' PAGE 1 

SPSS m 0S/36il, VERSION H, REtEASE 9.0. JUNE lOi 198i 

. ...CURgEUJ DOeUHlNTATiON FDR THE.SPS5. BATCH SVSTIH ^ 

ORDER FRDH HCGRAU-HILLl SPSSi 2ND ED. (PRINCIPAL TEXTJ ORDER FRDH SPSS INC. : SPSS STATISTICAL ALGORITHH's 

SPSS UPDATE 7-9 (USE W/5PSS.2N0 FOR REL. 7, 8i 9) KEYHORDS: THE SPSS INC. NEWSLETTER 

SPSS POCKET SUIBE, RELEASE 9 • 
SPSS PRIHER (BRIEF INTRO TO SPSS5 • 



DEFAULT SPACE ALLOCATION., ALLOSSFOR.. 102 TRANSFORMATIONS 

MRKSPACE 71680 BYTES .^09 R£CODE_VALUES_+ LAG VARIABLES 

TRANSPACE 102«0 BYTES 15^1 IFWUTE OPERATIONS ■ 

.i RUN NAHE A NEW VS. CONTINUING STUDENT TOEFL ANCOVA 

2 VARIABLE LIST U,A2,A3,ATDT,RET,Bl,B2.B3,BTQT,Cl,C2,C3,CT0T 

3 INPUT HEDIUH CARD ' 

1 INPUT FORMAT FnE0(«i3F2;0,lF3:0iS5XilFl;07«i3F2:0,lF3;0A)(,3F2.0rlF3.0) 



ACCORDING TO YOUR INPUT FORMAT, VARIABLES ARE TO BE READ AS F0LL0H5 



VARIABLE 


FORMAT 


RECORD 


CQtUHHS 


Ai 


F 2. 0 


I 


5- 


I 




F 2. 0 


1 


7- 


' 8 


n 


f 2. 0 


1 


9- 


10 


ATOT 


F 3. ^ 


' 1>V 


11- 


11 . 


RET 


F^l. 0 


1 


79- 


79' ' 


Bi 


F 2. d 


2 


' S- 


I 


B2 


F 2. 0 


2 


7- 


3 • ■ 


63 


F 2, 0 


2 


9- 


10 


BTOT 


F 3. 0 


2 


11- 


13 


CI 


F 2. 0 


3 


5- 


4 >: 


C2 


F 2. 0 


3 


1- 


S 


C3__> 


F 2. 0 


3 


.9- 


10 


CTOT ' 


F 3: 0 


3 


11- 


13 


PROVID£S_EO( 


[.._13_VARl4BLiS. 


._ 13. HILL-BE. READ 



SI 



IT PROVIDES FOR 3 RECORDS ('CARDS') PER CASE. A HAXIHUH OF 79 'COLUMNS' ARE USED ON A RECORD. 

5 N OF CASES " 98 _ 

6 SNOVA CTDT BY RET(0,1) BITH ATOT 

7 STATISTICS ALL 

•ANOVA' PROBLEM REQUIRES^ 182 BYTES OF SPACE. ' ' 

8 READ INPUT DATA , ' ^ ' 



135 



■ * . ' y . 

m vs. coNTiNoiNs "sTiibENt tdrpL mm 

FItE NQNAiiE (CREATION DATE « 06/17/81) ' ' ■ 

nn** ********** C E t t MEANS ; 
• . CIQT- 

1 ' ■ . ' •' 

TOTAt POPUtATION ' *' 

Jt55.J4 , ' 

r 98) • ■ • 

■ ! 

RET 

.. 0 . i . ■' ■ 

455.9^ ^58. 50 

( - 68) I 3d) ' . ^ 

■ ■ ^ V, 



08/1,7/61 



Table 6-12 

P.\GE" 2 




NEH VS. C0NTINUIN5 STUDENT TOEFI; ANCOVA 
FILE NONAME (CREATION DATE - 08/17/81) 



ii.it » » « (( » It K (I A N A i Y S I S OF V A R I A N C E )( )( « it K « « K «.» 
CTOT 
BVRETS" 

WITH ATOT ' 

lt(llttt»«lt»l(«K«)(i(ltltit)tii«Kit)(liiiXiiK<tltKXii^ 



SOURCE OF VARIATldK 

COVARIATES 
ATOT • 

MAIN EFFECTS 
RET 



EXPLAINED 

RESIDUAL 

TOTAL 



SUM OF 
SQUARES 

2601^8. 06} 

1271.000 
1271.007 

261119.063 

S7058.-688 

3SS^77.75a 



DF 



I 
2 

95 

97, 



MEAN SIGNIF 

SQUARE F OF F 

260 i'(8.()63 283.878 0.000 

260H8.06J 283.878 0.000 

1271.000 1 . 387 0.2<I2 

1271.007 1.387 0.242 



130709.500 1<;2,633 O.OQO 
916.407 
3592. 5S4 



CD«ARISTE RAH REGRESSION COEFFICIENT 



ATOT 



0.841 



98 CASES NERE PROCESSED: 
d CASES ( 0.0 PCT) HERE MISSING. 



HEH VS. CONTINUING STUDENT TOEFL') 
FUe" NdNAHE (CREATION DATE = 08717/81) 



a* MULTIPLE CLASSIFICATION ANALYSIS * 
CTCT 
BY RET- 
RITH ATDT 

« « « « n « « « « « M )( « « « M it it « M « i « « * « « ^ 



GRAND HEAN - ^55.34 



VARIABLE + CATEGORY 
RET 

d ;, 
1 



MULTIPLE R SQUARED 
MULTIPLE R 



ADJUSTED FOR 

ADJUSTED FOR INDEPENDENTS 

UNADJUSTED INDEPENDENTS +.COyARIATES 

N DEV'N ETA DEV'N BETA DiV'N BETA: 



6S 

30 



-1.^0 
3.16 



2.11 



0.04 



0.06 

0.750 
0.866 



08/17/51 



PAGE 



o 



m 



ERIC 



141 



NiU vs. C0NTINUIN6 STUDENT TOEFL ANCOVA 



Table C-15 

D8/17/S1 PAGE 5 



CPU mi umm.-. seconds 



1" 



9 ANdVA ' ' ^ CTOT BY RET(tl,l) HITH ATDT,BTOT 
Ifl STATISTICS ALL 

NOVA* PROiLEM REQUIRES 266 BYTES OF SPACE. 



to 

H 



N|fi vs; 'CONTiNiiiNe siiibiNT toefl ancova 

FILE NONAfil ICREATibN DATE » 08/17/81) 



08/17/81 



i ii i » ii S S if ,» I if S ii »•■ C E L L MEANS ************** 
... c'fQT 
BUET - 



i * * i i ***************** * 



Table .C-16 

PASi 6 ' 



TOTAL POPULATION 
( 98) 



RET 



d 



158:50 
( 68) ( 30) 



n. 

t - 

M 



er|c 144 



145 



NEH VS. CONTINUING STUDENT TOEFt 



FItE NQNAME (CREATION DATE 


» 08/17/81) 








» K » » It It D » » t A N 'A 1: Y S 


IS OF 


V A R 1 A N C E » « » It » It If 


It H It 


CTOT 






■J 




BV ill 










HITH ATOT 










BTQT 










ititiiititit«ii«it«jt««itii 


« It K M It » 


It It K « 




It M It 




.SUHJF 




MEAN 


SIGNIF 


SOURCE OF VARIATION 


SQUARES 


OF 


, SQUARE F 


OF F 


COVARIATIS 


283109.875 


2 


14155^.938 205.570 


D.ODO 


ATOT 




1 


3282.492 4.721 


0.032 


BTOT 


JZ961.793 


i 


22961.793 33.021 


d.bijij 


MAIN EFFECTS 


3.750 


1 


3.750 0,005. 


0*7^2 


RET 

• 


3.7<i5 


1 


3.71^5 0.005 




EXPLAINED 


283113.625 


3 


r'iiTl.iBS, 135.715 


d.ooo 


RESIDUAL 


6536^.125 


9^ 


695.363 




TOTAL 


348«77;750 


97 


3592;554 


1 



08/17/81 



Table C-17 

PAGE 7 '° 



n 
■ 

UJ 



COVARIATE RAH REGRESSION COEFFICIENT 



V ATOT 
BTQT 



0.2^5 
0.616 



9S CASES |j£RE PROCESSED, 
fl CASES I 0.0 PCT) WERE HISSING. 



146 

ERIC 



1 ^7 



m vs. CONTINUIKG STUDENT TOIFI AHCQVA 
fill 'mm (CREATION DATE ' Ofl/ll/Sl) 

MULTl'PLE CLASSIFICATION A N A L Y S I S ;M . 

CTOT 
BY RET 
HtTH AIOI 

_ ■ ADJtlSIED.FOR 

GRAND MEAN* ^SJ-i ^ ► ADJUSTED FOR INDEPENDENTS 

UNADJUSTED INDEPENDENTS +_COyARiATES 

•N DEVN ETA DEV'fi lETA DEV'N BETA 



08/17/81: 



Table C=18 

PAGE 3 



VARIABLE ^ CATEGORY ' 



RET 



MULTIPLE R SQUARED 
MULTIPLE R 



S8 
30 



■o.id 



O.DO ' 

D.S12 
0;9fll 



ERIC 



148 



1# 



0 
I 

ro 



NEH VS. CONTINUING STUDENT TOEFL ANCOV* ! 0B/17/SI 

i 

CPU TIHE REQUIR'£D..' ' 0.28 SECONDS 1 



n mm ci by RET(Oii) mjth 

12 STATISTICS ALL \- 

I 

'Mm' PROBtEH REQUIRES IB2 BYTES OF SPACE: 1 



KER VS; CONTlilNS STUDENT TOEFL ANCOVA 
FUE NQNAHE (CREATION DATE = W/imi) 

______________ »ceh HEAM^ i(j(j(«j(«8******** 

((i(«i(i(i(i(»i()(««««.CEtt HEANb ««««« 

_.. Cl_ 
BY RET 



TDTAt POPULATION 

51.31 
( 96) 



RET . 

d ' 1 

51.29 51. 3J 

( 68) ( 30} 



Table C-20 

0BX17/81 PAGE iii 




NEH VS; .CONTINUING ^TUQENT TOEFL mm 0S/17/S1 
fUE mm (CREAlioN DATE = 08/17/61) 



» » X « « « » «_» K A N A t Y S 1 S OF V A R I A N C i » » » » » » » » » ». 
CI 
BY Rlt 
HITH Al 

l*i(*jt**«««Ki(j<jtxjijij(^ *»»>'»»»»»» 



SUM OF pjEAN SiGNIF 

SOURCE OF VARIATION SQUARES DF SQUARE ' F OF F 

Cbi/ARiATES 1985.38* 1 1985.38^ J't.ZK O.OOl) 

AI 1555;38'i I IS85;3B1 9';;2i^ 0:000 



HAINJFFECTS 129.^63 1 WiMi i.W 0.015 

RET 129. «8 1 129. «8 5.1« O.OIS 

EXPLAINED all's. 852 2 1057, «6 50. 179 0.000 

RESIDUAL zm.Hi 95 21 .073 

TOTAL mi.m 97 i^i.m 



CQVSRIATE RAH REGRESSION COEFFICIENT 
AI 0;S50 



98 CASES HERE PROCESSED. 
0 CASES { 0.0 PCT) MERE HISSING. 



m VS; CONTIHDING STUDENT 
FilE NONAMi (CREATION DATE » 08/17/81) 

««« NULTJPLE CLASSIFICATION ANALVSIS ««« 

CI. 
RET 

smsm'IVI.. ..••••«•-•'••"••""""""•;.*.' *' 

• - ' MJBSTEO FOt 

tl«Kll(6«N. Sl.Jl miSTIBj.Il! jWEfENtS 

iABJUSttij MDEPEHIIEHTS • CeVtlliltS 

mmi * C.TE60S « ET, KVK BIT* D£».» BEW 

MtltTIPtE R SQUARED qjU 
HULTIPLE R 



08/17/81 



tm C-22 
PAGE 12 



n 

- ft 

00 



156 ^ 157 

o 

ERIC 



Table C-23 

NEH vs. CdNTiNtiiH^ STUfilHT TOEFL ANCDVA 08/17/81 PAGE U 

CPU TIME REQUIRED.. 0.25 SECONDS . 



n mm ci sv ret(o,i) rith ai.bi 
it> statistics all . 

'anova' probleh requires m bvtes dp space. 



h 

t. . 

vS 



4 



ija/i7/si '!4 

NEB vs. CONTINOINB STUDENT TdEFi ANCOVS 
Ful NONANE (CREATIQN DATE = 08/17/813 



M» CEU HEANS 



..■•....lAiM........."."''-'""''"""""' 



TOTAt POPBtATlON 



( li) 



RET 

0 1 

51,29 5l;33 

( 68) ( 50) 



0 

* 

o 
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NEH VS. CONTINUINS STODENT TDEFL mm 
hit NdNAME (CREATION DATE = 02/17/81) 



« *< ii « * « * i i U NA L Y S I S Q F: V A R 1 S N C E » « * * * ii if » n 
... CI 
BY RET 
HiTii Al 
BI 

« * « i ii M ii * li » ), n « » « » « « » i( # « i( M « «-s it » » » « » » 



08/17/81 



Table C-25 

PAGE IS 







-SUM OF 




SOURCE OF VARIATioN 




SQUARES 


DF 


COVARIATES 




2727:258 


2 


Al 




29.^55 


1 


Bl 


* 


711. 87« 


1 


MAIN EFFECTS 




17.J36 


} 


RET 




17.337 


1 


EXPt'AlNEO 




27W.5?'i, 


. l 


RESIDUAt 

4 




im.m 




TOTAL 




fills. 801 ■ 


97 



KEAN 
SQUARE 



F 



SiGNIF 
OF F 



1363.629 93.412 Q.OQO 

_29.455 2.0ie 0.159 

m.m SQ.S20 o.doo; 

17.33'S t;l8S ill 279 

17.;37 1.188 11.279 

911.865 62.671 B.OOO 

11.598 

12.111 



CdVARlATE RAM REGRESSION COEFFICIENT 



Al 
Bl 



0.130 
0.661 



98 CASES MERE PROCESSED. 
D CASES ( O.d'PCT) mi HISSING'. 



1^. 



1 

103 



NEU VS. CONTINUING STODENT TOEFL ANCOVA " ^ 

FILE mmt (CREATION DATE - 08/17/81) 

Nu'lTIPLE classification ANALYSIS * * * 

■ CI ■ . .. 

BY RET 
^ HiTH Ai 

Bl ______ f 

********** ********** 

, _ ■ AOJtlSTED_FOR 

GRAND MEAN = 51^1 ^^^^^^^__ _^_ j,^jjEpE^pENT5 ■ 

UNADJUSTED ' INDEPENDENTS +_COyARIATES 
VARIABLE CATEGdRV H DEVN ETA DEV'N BETS; OEV'N ^BETA 

^^^g ' SB ■ -0,01 J'30 

1 ■ • 30 D.D3 ■"•^2 

^ , . O.ilb B.07. 

-. ' . 0.667 

HULtlPLr R SQUARED p^gj^ 

NUtTIPLE R 



164 



' ' . Table C-27 

NEW vs. CbKTINUING STUDENT TOEFt ANCOVA 08/17/81 rage 17 



CPU TIME REQUIRIO;; 0;2) SECONDS 
.» 



\ ■ is ANdVA C2 BY RETIO,!) WITH A2 

\^ 16 STATISTICS AtL 

'ANOVA' mmw REQUIRES 182 BYTES OF SPACE. 



0 



166 lb? 
erJc 



m VS; CONTINUING StUOlNT TOEFL ANCOVA 

Fit! mm (CREATION DATE 08/17/81) 

TOTAt POPULATidN 
(98) 



RET 

ii I 

( 68) ( iO) 



iB8 



N£ij VS. CdNTiNUiNG STUiiENT TOEFL ANCOVA 



1 Tabie:C-29 

0B/17X81 i hSE 19 



FILE NDNAME (CREATION SATE 


' dfi/i7/8ij 










» tt K « K « K » fi « J N s I y S 
C2 

-By RET 
HITH A2 


IS OF 


V A R . I 


A N C E « « x 

n n V • " ~ 


If M K 


# K It D 




K M X X » X 


K K K k 


M It )t » It K It 


it * H 


K it y it 


SOURt. OF VARIATION 


50H OF 
SQUARES 


DF 


MEAN 
SQUARE 


F 


SIGNIF 
OF F 


CQVARIATES 
A2 


2859. -5^7 
2889. «7 


1 
I 


268.9.117 172. US 
2889 . ^7' 172.118 


d.oob 

0.000 


MAIN EFFECTS 
RET 


^.972 
ii.972' 


1 
I 


1.972 
1;97Z 


0.297 
0.297 


0.587 
0;S87 


EXPLAINED 


289<t.'il9 


2 


1117.209 


86.357 


0.000 


RESIDUAL 


1592. D^6 


95 


, .16.758 






TOTAL 


448&.465 




IS. 252 


1 





COVARIATE RAW REGRESSION COEFFICIENT 
A2 0.712 



98 CASES WERE PROCESSED'. 

0 CASES ( 0.0 PCT) HERE MISSING. 



170 



ERIC 



m 



NEH VS. CONTINUING STUDPNT TOEFL 

♦ 

FItE NONAHE , ('clEATidN DATE = 0B/l7/8i) 

HUniPLE CtASSIFICATidN ANALVSIS' 

_ RET 

I « « « « fHV! M i M * » ^ * ^ » » ^ » 

GRANS MEAN « 'iZ.^J 



VSRIABtE + CATEGORY 

BIT . 
0 

i 



nULIIPLE R SQUARED 
HULTIPtE R 



ADJUSTED FOR , 
ADJOSTEDJQR INPEPENDENTS ' 
UNADJUSTED INDEPENDENTS + COVARIAIEr 
N OEVN ETA DEV'N BETA DEV'N BETA 



■Dim 

0.08 



-0.15 



O.Ol 



O.OI 

0.6^5 
Q.SQ} 



06/17/81. 



Table WO 

PAGE 20 



n 

# 

0\ 



172 



ERIC 



173 



/ 



/ 



JilH VS. CONTIHaiNG STUDENT TOEFt AMCOVA 

I _ . .., . 

CPU TIHE REQUIRED. . 0.2< SECONDS 



08/17/81 



Table C-31 

PAGE 21 



17 ANOyA____. C2 RET(O.l) WITH A2,E2 

18 STATISTICS Atl 



'ANOVA* PROBLEH REQUIRES IH BYTES OF SPACE. 



0:1 



NEU «S. CQNTIN'jiHG iTDDINT^^TOEFL ANCOVA 

FILE • NONAHE (CREATIW DATE « Wnmi) I 

« « im * It » It » « t » » » C E L I H E A N'S » « • " 

C2 ■ ; .. , 

f 

TOTAL POPatATlON 

SET ..• ./ 

I 6fi) ( 10) 



00 



' f 



er|c ' 



17G 



m 



NE8 ?S. CONTtNUlNG STUDENT TOEFL ANCQVA 










FUE mmi (CREATION DATE = 


QS/I?/SI) 










■ 

«« «»««)»»)(«' iS N S L y S I 

ci 


S D F 


V A R I A N C E )( « » 


M M U 
* * ft 


M M 11 M 
t t t t 


. .B.y ?ET 












WITH A2 












B2 














ii « « * X 


* * * » 


It « « i* « X « 


* n i 


« )l K 1 




SUM OF 




HcAN 




SIGNIF 


SOURCE OF VARIATION 


SQUARES 


he 


SQUARE 


F 


Ur r 


C09ARIATES 


311J.6?J 


i 




ii.ODti 






1 


553.63'! 


37. 931 


0.000 


B2 

i 


LCI * C*10 


1 


224.2^6 


15.364 




MAIN FFFffT^ 

■ mill L t r L u 1 J 


ft Itt 


I 


0.766 


0.052 


S.819 


RET 


0.7SI) 


1 


0.766 


O.052 


1J.819 


EXPLAINED 




3 


1038.153 


71;127 


0:OD0 


RESIDUAL 


1372.006 




11.556 






TOTAL 




97 


^6.252 







08/17/61 



Table C-33 

PAGE 23 



COVARIATE RAH REGRESSION COEFFICIENT 



0 
u 



B2 



0.:/00 
0.316 



98 CASES WERE PROCESSED. 
(I CASES ( O.O PCT) WERE mt'l\5. 



ERIC 



"h?., vs. CdNTiNUiNG ST'JSIHT TOEFL- SNCOW 

i 

FILE mmt ;CR£niON DATE = 98/17/8!) 

i i , H 0 C T I P t E CLASSIFICATION ANALYSIS i' * « 

ii 

bL.. iv. 

HITH S2 . 

B2 . - 



ti « « 

GRAND MEAH - "42. ^9 
VARIABLE + CATEGORY 



ADjysTED m 

■ ADJUSTED FOR INi'iiPENOENIS 
UNASJUSIEi) INDEPENDENTS + CDVARISTES 
H DEVN ETA DEV'N BETA DEV'N BETA 



06/17X81 



Table e-34 

PAGE 2^. 



RET 



KyLfiPLE R SijUASED 
HOLTIPLE R 



68 
30 



0.03 



0.01 



-E.04 
O.IJ 



Q.Di 
G.8J3 



0 



181 



m vs. CONTlNdING STUDENT TOEFt ANCDVS 



CPD TINE REQUIRED.. 0.27 SECONDS 



I? ANOVA.:... CJ BY RET(O.l) WITH A3 
20 STATISTICS AtL 

'ANtiVA' PROBLEH REQUIRES 182 BYTES 9F SPACE. 



m vs. CONTINDiNG sfdOENT TOEFL ANCCiVA 
FitE NONAilE (creation' DATE = 08/17/81) 



Table G-36 

08/17/81 PAGE 25 



«««««««««;(«««« CELL HEANS ********** ^ * 
CJ, 

BV RET ^ - . 

ttitiii ***************************** * 

TOTAL POPBLATiON 



RET 

i) 1 
( iS) ( 31)) 



ERIC 



to 



184 



. ^ Table e-37 

NEii vs. CdNTiNiiiNG student tOEFL ANCdi/A ^ 06/17/81 PAGE 27 

FILE nmm (CHEATitiN date = ij8/i7/8n 



» • t 1. *( * It « i i( n N A I y S I S 0 F 
C3 . 
BY RET ' 

HITB AJ 


V A R I A N C E » « * « * « 


* * * » 




« X H K K 


K X II K 




» * * i 


t 


SUM OF 




HEAN 


SIGNIF 


SOURCE OF VARIATION 


' SQUARES 


DF 


SQUARE F 


OF F 


eOVARIATES 


33^B;73D 


1 


33^3.730 219.678 


0.000 


AO 


3346.730 


1 


33^(8.730, 219.678 


n nnn 


HAIN EFFECTS 


13.S39 


1 


13.539 0.888 




RET 


13.539 


1 


13.539 0.888 


0.318 


EXPLAINED 


3562,269 


2 


1681.135 110.283 


O.OQO 


RElIDtlAt 


U48.16D 


95 ., 


15.2^2; 




TPTAL 


<»8n.«30 


97 


49.592 





COVARIATE RAM REGRESSION COEFFICIENT ' ' '■' ' ' i. 

u 

A3 0.889 



98 CASES HERE PROCESSED, 

.0 CASES ( 0.0 POT) MERE MISSING. 



ERIC. 



18? 



NEH VS; COMTINIIIN& 5Tt)D!NT T8IFL ANCOVA 
FItE NONAHE (CREATION iJATE - 'WX^VCi 

\t* HUlTiPlE C L A S S I f ; ^ U 0 N A N a1 V-S I S » « i 
L . CJ_ 

BV RET 

KITH A3 _ • 



GRAND HEAN - ^ 42.52 

VARIABtE + CATEGORY 

RET . 
0 
1 



MULTIPLE- R SQUARED 
MOtTIPtE a 



: ADJDSTED FOR 
ADJUSTED FOR ' INDEPlHSENTS 
UNKUJUS-IED INDEPEHPEKI5 - ^,m^Am> 
N - -jjlV'N ETA DEV'K mi BETA 



00 

30 



0.15' 



o.iil 



-0.56 



0.05 

0.69? 
O.S3£ 



Olj/17/SI 



Tabla C-38 
Page \i 



^ 



ERIC 



188 



m vs. CONTINUING STUDENT TQEfl MZm 



* 28/17/81 



n mm ci by jETto.u hit« a3;b3 

STATISTICS All 



,'*NOV*' PftOBtEH PIQIIIRES 266 BYTES OF mtV. 



HEN VS; CONTINiiiNG StUDENT TOEFL ANCDVS 
FItE ■ NONAHE (CREATiON Dil£ = 08/17/81) 

CEit MEANS ^ii *********** 

... C3- 

R¥ RFT 



TOTAt POPULATION 

42.52 
( 9S) 



RET 

d 1 
( &8) i. iO) 



■ Table e46 
08/n/ai : P'^sE io 



0 

I 

0^ 



o I 192 " 193 

ERIC 



NEW VS. CONTINOING STUDENT TOEFt ANCDVA 
FilE NbNAHE .KREATION DATE = ^18/17/81) 



* '« « » « » «■ « i( » A N A L Y 5 i 5 d F H H A N C M ii. i* 

. . C3- 
BY RET 

WITH AJ K 
33 



Table Ul 

08/17/81 PAGE 51 



SOURCE OF VARIATION 
CbVARiATES 
B3 



MAIM EFFECTS 
RET ' 

EXPLAINED 

RESIJQAt 

TOTAL 



0;865' 
0.865 

3921.707 

588:722 

^810. ^O 



COVARIATE RAW REGRESSION COEFFICIENT 



m\i SIGNIF 

OF imi F Of F 

2 1960.^21 2:07. J5J 0.000 
1 3^. 767 8.966 O.Ofl'i 
1 572.112 60.512 O.QCO 

1 0;855 0. 092 0;7i5J 

1 0.865 0.092 0.763 

3 V^JS7.236 133.266 ' 0.000 
9^ lii.S'i 

97 <i9.S92 



A3 
B3 



0.-275 
0.653 



98 CASES HERE PROCESSED. 
0 CASES t 0.0 PCI) HERE HISSING. 



•r, 
r 




NEB VS; CONTINUING STUOINT TOEFL ANCSVA 
FItE NONAHE (CREATION DATE = 08/17/315 

HU'LTtPL'E CLASSIFICATION .AN AlVS IS * * * 
CJ^ 
BY RET 
WITH A3 

i i i i i ii i * s ' 

„-v ; ADJUSTED J:QS 

GRAND MEAN-- 'iZ-SZ . ADJUSTED FOR INDEPENDENTS 

. . "• 'unadjusted independents ■■ +_covariaies 

VAHIABLE . CATEGORy N ETA DEV'N BETA DEVN BETA 



ret 



0 ; (8 -0.06 

so 0.15 ' 0-1^ 

^ 0.01 



• 0.615 

MULTIPLE R SQUARED ^^gj 

NijLtiPLE R 



196 

ERIC 



m VS; CONTINUING •STUDENT TQEFt ANCDVA 



CPU HUE REQUIRED;; ,0:28 SECONDS 



23 FINISH 



NORKAL END DF JOB. ■ 

Ci tyr!i!!yL..V,»!U3_HCRC_rKUl.t35CU. 

0 ERRORS HERE DETECTED. 



4 n'^ 
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